SQL Interview Prep Notes

Tools

Tool #1: Sum (minus / multiply / divide) 2 column for each row

SELECT column1, column2, (column1 + column2) AS sum_of_columns
FROM your_table;

Example:

product_id price quantity
1001 50 2
3001 20 4
SELECT product_id, quantity, price, (quantity * price) AS total
FROM product_table;
product_id price quantity total
1001 50 2 100
3001 20 4 80

Tool #2: Sum all rows of a column

SELECT SUM(column_name) AS total_sum
FROM table_name;

Example:

SELECT SUM(quantity) AS total_quantity
FROM product_table;
total_quantity
6

Tool #3: Common Table Expression (CTE)

WITH cte_1 AS (
...
),

cte_2 AS (
...
),

SELECT ...
FROM cte_name

Data Lemur: Sending vs. Opening Snaps

Tools used: #1, #2, #3

WITH cte AS (
SELECT * ,
(CASE WHEN activity_type = 'open' THEN time_spent ELSE 0 END) AS t1,
(CASE WHEN activity_type = 'send' THEN time_spent ELSE 0 END) AS t2
FROM activities a1
INNER JOIN age_breakdown a2
ON a1.user_id = a2.user_id
WHERE a1.activity_type IN ('send', 'open')
)

SELECT age_bucket, round(SUM(t1)*100.0/SUM(t1 + t2), 2) AS open_perc,
round(SUM(t2)*100.0/SUM(t1 + t2), 2) AS send_perc
FROM cte 
GROUP BY age_bucket

Tool #4: Windows Function

window_function() OVER (
  PARTITION BY partition_column
  ORDER BY order_column
  ROWS BETWEEN n PRECEDING AND m FOLLOWING
)

i) Aggregate Window Functions: SUM, COUNT, AVERAGE, MAX, MIN

ii) Ranking Window Functions:

RANK

A 5000 1
B 3000 2
C 3000 2
D 2000 4

DENSE_RANK

A 5000 1
B 3000 2
C 3000 2
D 2000 3

ROW_NUMBER

A 5000 1
B 3000 2
C 3000 3
D 2000 4

iii) Lead and Lag Functions: LEAD, LAG

Data Lemur: Tweets' Rolling Averages

SELECT    
  user_id,    
  tweet_date,   
  ROUND(AVG(tweet_count) OVER (
    PARTITION BY user_id     
    ORDER BY tweet_date     
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
  ,2) AS rolling_avg_3d
FROM tweets;

Tool #5: Get sum/count with subquery and use in WHERE

Supercloud Customer

WITH cte AS (
SELECT customer_id, COUNT(DISTINCT product_category) AS category_count
FROM customer_contracts JOIN products
ON customer_contracts.product_id = products.product_id
GROUP BY customer_id
)

SELECT customer_id FROM cte
WHERE category_count = (SELECT COUNT(DISTINCT product_category) FROM products)

Concepts

Concept #1: Order of Execution

i) FROM
ii) WHERE
iii) GROUP BY
iv) HAVING
v) SELECT / PARTITION BY
vi) ORDER BY

User's Third Transaction

Wrong:

SELECT user_id, spend, transaction_date, ROW_NUMBER() OVER(PARTITION BY(user_id) ORDER BY transaction_date) AS counter
FROM transactions
WHERE counter = 3

We will get the error "column counter does not exist". This is because WHERE is executed first before PARTITION BY.

Correct:

WITH cte AS(
SELECT user_id, spend, transaction_date, 
ROW_NUMBER() OVER(PARTITION BY(user_id) ORDER BY transaction_date) AS counter
FROM transactions)

SELECT user_id, spend, transaction_date
FROM cte 
WHERE counter = 3;