Analysis | Do | Review |
Let’s find the products that had the most number of ratings by year
|
Run the following SQL in Query Editor.
select product_category,
dd.d_year,
count(PR.star_rating) AS cnt
from public.product_reviews pr
join public.date_dim dd on pr.review_date = dd.d_date
group by 1,2
Order by 3 desc, 1,2
limit 10;
|
When the query completes you will see the following in the query results
|
Let’s join the customer data with products review data to analyze the ratings by customer state
|
Run the following SQL in Query Editor.
SELECT PR.product_category,
A.ca_state AS customer_state,
count(PR.star_rating) AS cnt
FROM public.product_reviews PR,
public.customer C,
public.customer_address A
WHERE PR.customer_id = C.c_customer_sk
AND C.c_current_addr_sk = A.ca_address_sk
AND PR.marketplace = 'US'
GROUP BY 1, 2
Limit 10;
|
|
Let’s review if there are any performance alerts for the above query. Run the following
|
Run the following SQL in Query Editor.
select
trim(s.perm_table_name) as table ,
(sum(abs(datediff(seconds, coalesce(b.starttime,d.starttime,s.starttime),
case when
coalesce(b.endtime,d.endtime,s.endtime) > coalesce(b.starttime,d.starttime,s.starttime)
THEN coalesce(b.endtime,d.endtime,s.endtime)
ELSE coalesce(b.starttime,d.starttime,s.starttime) END )))/60)::numeric(24,0) as minutes,
sum(coalesce(b.rows,d.rows,s.rows)) as rows,
trim(split_part(l.event,':',1)) as event,
substring(trim(l.solution),1,60) as solution ,
max(l.query) as sample_query,
count(distinct l.query)
from stl_alert_event_log as l
left join stl_scan as s on s.query = l.query and s.slice = l.slice and s.segment = l.segment
left join stl_dist as d on d.query = l.query and d.slice = l.slice and d.segment = l.segment
left join stl_bcast as b on b.query = l.query and b.slice = l.slice and b.segment = l.segment
where l.query in
(SELECT max(query)
FROM STL_QUERY
WHERE TRIM(querytxt) like '%WHERE PR.customer_id = C.c_customer_sk%' )
group by 1,4,5
order by 3 desc,6 desc;
|
You will see the following output.
The recommended solution is to review the choice of distribution key to collocate the join.
|
The product_reviews table is joining the customer table on “customer_id” column. Let’s change the distribution style of the product_reviews table
|
Run the following SQL in Query Editor.
alter table public.product_reviews alter distkey customer_id;
|
|
Review the tables again
|
Run the following SQL in Query Editor.
select "table", diststyle, skew_rows, tbl_rows,stats_off
from svv_table_info
where "table" in ('product_reviews', 'customer','customer_address');
|
The results pane will show the following output. The distribution key of products_reviews table has been changed.
|
Let’s execute the same query again and notice how long this query takes to run.
|
Run the following SQL in Query Editor.
SELECT PR.product_category,
A.ca_state AS customer_state,
count(PR.star_rating) AS cnt
FROM public.product_reviews PR,
public.customer C,
public.customer_address A
WHERE PR.customer_id = C.c_customer_sk
AND C.c_current_addr_sk = A.ca_address_sk
AND PR.marketplace = 'US'
GROUP BY 1, 2
Limit 10;
|
|
Let’s review the performance alerts for this query
|
Run the following SQL in Query Editor.
select
trim(s.perm_table_name) as table ,
(sum(abs(datediff(seconds, coalesce(b.starttime,d.starttime,s.starttime),
case when
coalesce(b.endtime,d.endtime,s.endtime) > coalesce(b.starttime,d.starttime,s.starttime)
THEN coalesce(b.endtime,d.endtime,s.endtime)
ELSE coalesce(b.starttime,d.starttime,s.starttime) END )))/60)::numeric(24,0) as minutes,
sum(coalesce(b.rows,d.rows,s.rows)) as rows,
trim(split_part(l.event,':',1)) as event,
substring(trim(l.solution),1,60) as solution ,
max(l.query) as sample_query,
count(distinct l.query)
from stl_alert_event_log as l
left join stl_scan as s on s.query = l.query and s.slice = l.slice and s.segment = l.segment
left join stl_dist as d on d.query = l.query and d.slice = l.slice and d.segment = l.segment
left join stl_bcast as b on b.query = l.query and b.slice = l.slice and b.segment = l.segment
where l.query in
(SELECT max(query)
FROM STL_QUERY
WHERE TRIM(querytxt) like '%WHERE PR.customer_id = C.c_customer_sk%' )
group by 1,4,5
order by 3 desc,6 desc;
|
You will see the following output. No alerts!
|