1. Business Requirements

Your marketing team wants to use product reviews data to gain insight on which products are liked by customers by state in the category of “Home and Grocery”. This will enable the business to plan new product offerings. The business users want to generate these reports more frequently and with the performance SLA of completing these reports in seconds. They also want to integrate the analyzed data back to the datalake on Amazon S3 to be used by various analytical applications.

To meet the business needs the data engineering team has come up with the following data model.

For this lab we will leverage the following datasets.

Dataset Details Location on Amazon S3
product_reviews Amazon Customer Review https://console.aws.amazon.com/s3/buckets/amazon-reviews-pds/parquet/?region=us-east-1
date_dim TPC-DS 3TB https://console.aws.amazon.com/s3/buckets/redshift-downloads/TPC-DS/3TB/date_dim/?region=us-east-1&tab=overview
customer TPC-DS 3TB https://console.aws.amazon.com/s3/buckets/redshift-downloads/TPC-DS/3TB/customer/?region=us-east-1&tab=overview
customer_address TPC-DS 3TB https://console.aws.amazon.com/s3/buckets/redshift-downloads/TPC-DS/3TB/customer_address/?region=us-east-1&tab=overview

customer and customer_address tables are already created and loaded with data.