Let’s tune the physical model in Amazon Redshift leveraging best practices and with the objective of meeting the performance SLA of running the queries in seconds.
The goal is to leverage the massively parallel processing of Amazon Redshift, where the queries will execute on each and every compute node and by minimizing I/O.
Let’s review the concepts in Amazon Redshift which will help us achieve these goals.
If this were a production cluster, you’d get daily suggestions from the Amazon Redshift Advisor on these topics for free.
Concept | Goal | Benefits |
---|---|---|
Compression/Encoding |
• Allow more data to be stored within an Amazon Redshift cluster
• Improve query performance by decreasing I/O |
Allows two to four times more data to be stored within the cluster |
Zone Maps • Automatically built in-memory block metadata • Contains per-block min and max values • All blocks automatically have zone maps |
• Eliminates unnecessary I/O • Effectively prunes blocks that cannot contain data for a given query |
Automatically improves filter performance |
Sort Keys | • Make queries run faster by increasing the effectiveness of zone maps and reducing I/O | Improves filter performance |
Distribution keys |
• Distribute data evenly for parallel processing across compute nodes
• Minimize data movement during query processing |
Improves join performance |