8. Analyze product review data

Analytical Queries

AnalysisDo Review
Let’s find the products that had the most number of ratings by year Run the following SQL in Query Editor.

select product_category,
      dd.d_year,
      count(PR.star_rating) AS cnt                                                           
from  public.product_reviews pr
join  public.date_dim dd on pr.review_date = dd.d_date
group by 1,2
Order by 3 desc, 1,2
limit 10;

When the query completes you will see the following in the query results
Let’s join the customer data with products review data to analyze the ratings by customer state Run the following SQL in Query Editor.

SELECT PR.product_category,
      A.ca_state AS customer_state,
      count(PR.star_rating) AS cnt
FROM public.product_reviews PR,
     public.customer C,
     public.customer_address A
WHERE PR.customer_id = C.c_customer_sk
AND C.c_current_addr_sk = A.ca_address_sk
AND PR.marketplace = 'US'
GROUP BY 1, 2
Limit 10;

Let’s review if there are any performance alerts for the above query. Run the following Run the following SQL in Query Editor.

select 
 trim(s.perm_table_name) as table , 
 (sum(abs(datediff(seconds, coalesce(b.starttime,d.starttime,s.starttime), 
   case when 
    coalesce(b.endtime,d.endtime,s.endtime) > coalesce(b.starttime,d.starttime,s.starttime) 
      THEN coalesce(b.endtime,d.endtime,s.endtime) 
      ELSE coalesce(b.starttime,d.starttime,s.starttime) END )))/60)::numeric(24,0) as minutes,   
 sum(coalesce(b.rows,d.rows,s.rows)) as rows, 
 trim(split_part(l.event,':',1)) as event,  
 substring(trim(l.solution),1,60) as solution , 
 max(l.query) as sample_query, 
 count(distinct l.query)
from stl_alert_event_log as l
left join stl_scan as s on s.query = l.query and s.slice = l.slice and s.segment = l.segment
left join stl_dist as d on d.query = l.query and d.slice = l.slice and d.segment = l.segment
left join stl_bcast as b on b.query = l.query and b.slice = l.slice and b.segment = l.segment
where l.query in
(SELECT max(query)
 FROM STL_QUERY
 WHERE TRIM(querytxt) like '%WHERE PR.customer_id = C.c_customer_sk%'  )                                                                                                                     
group by 1,4,5 
order by 3 desc,6 desc;

You will see the following output.
The recommended solution is to review the choice of distribution key to collocate the join.
The product_reviews table is joining the customer table on “customer_id” column. Let’s change the distribution style of the product_reviews table Run the following SQL in Query Editor.

alter table public.product_reviews alter distkey customer_id;

Review the tables again Run the following SQL in Query Editor.

select "table",  diststyle, skew_rows, tbl_rows,stats_off
from svv_table_info
where "table" in ('product_reviews', 'customer','customer_address');

The results pane will show the following output. The distribution key of products_reviews table has been changed.

Let’s execute the same query again and notice how long this query takes to run. Run the following SQL in Query Editor.

SELECT PR.product_category,
      A.ca_state AS customer_state,
      count(PR.star_rating) AS cnt
FROM public.product_reviews PR,
     public.customer C,
     public.customer_address A
WHERE PR.customer_id = C.c_customer_sk
AND C.c_current_addr_sk = A.ca_address_sk
AND PR.marketplace = 'US'
GROUP BY 1, 2
Limit 10;

Let’s review the performance alerts for this query Run the following SQL in Query Editor.

select 
 trim(s.perm_table_name) as table , 
 (sum(abs(datediff(seconds, coalesce(b.starttime,d.starttime,s.starttime), 
   case when 
    coalesce(b.endtime,d.endtime,s.endtime) > coalesce(b.starttime,d.starttime,s.starttime) 
      THEN coalesce(b.endtime,d.endtime,s.endtime) 
      ELSE coalesce(b.starttime,d.starttime,s.starttime) END )))/60)::numeric(24,0) as minutes,   
 sum(coalesce(b.rows,d.rows,s.rows)) as rows, 
 trim(split_part(l.event,':',1)) as event,  
 substring(trim(l.solution),1,60) as solution , 
 max(l.query) as sample_query, 
 count(distinct l.query)
from stl_alert_event_log as l
left join stl_scan as s on s.query = l.query and s.slice = l.slice and s.segment = l.segment
left join stl_dist as d on d.query = l.query and d.slice = l.slice and d.segment = l.segment
left join stl_bcast as b on b.query = l.query and b.slice = l.slice and b.segment = l.segment
where l.query in
(SELECT max(query)
 FROM STL_QUERY
 WHERE TRIM(querytxt) like '%WHERE PR.customer_id = C.c_customer_sk%'  )                                                                                                                     
group by 1,4,5 
order by 3 desc,6 desc;

You will see the following output. No alerts!