SQL Interview Prep Notes
Tools
Tool #1: Sum (minus / multiply / divide) 2 column for each row
SELECT column1, column2, (column1 + column2) AS sum_of_columns
FROM your_table;
Example:
product_id | price | quantity |
---|---|---|
1001 | 50 | 2 |
3001 | 20 | 4 |
SELECT product_id, quantity, price, (quantity * price) AS total
FROM product_table;
product_id | price | quantity | total |
---|---|---|---|
1001 | 50 | 2 | 100 |
3001 | 20 | 4 | 80 |
Tool #2: Sum all rows of a column
SELECT SUM(column_name) AS total_sum
FROM table_name;
Example:
SELECT SUM(quantity) AS total_quantity
FROM product_table;
total_quantity |
---|
6 |
Tool #3: Common Table Expression (CTE)
WITH cte_1 AS (
...
),
cte_2 AS (
...
),
SELECT ...
FROM cte_name
Data Lemur: Sending vs. Opening Snaps
Tools used: #1, #2, #3
WITH cte AS (
SELECT * ,
(CASE WHEN activity_type = 'open' THEN time_spent ELSE 0 END) AS t1,
(CASE WHEN activity_type = 'send' THEN time_spent ELSE 0 END) AS t2
FROM activities a1
INNER JOIN age_breakdown a2
ON a1.user_id = a2.user_id
WHERE a1.activity_type IN ('send', 'open')
)
SELECT age_bucket, round(SUM(t1)*100.0/SUM(t1 + t2), 2) AS open_perc,
round(SUM(t2)*100.0/SUM(t1 + t2), 2) AS send_perc
FROM cte
GROUP BY age_bucket
Tool #4: Windows Function
window_function() OVER (
PARTITION BY partition_column
ORDER BY order_column
ROWS BETWEEN n PRECEDING AND m FOLLOWING
)
i) Aggregate Window Functions: SUM, COUNT, AVERAGE, MAX, MIN
ii) Ranking Window Functions:
RANK
A | 5000 | 1 |
B | 3000 | 2 |
C | 3000 | 2 |
D | 2000 | 4 |
DENSE_RANK
A | 5000 | 1 |
B | 3000 | 2 |
C | 3000 | 2 |
D | 2000 | 3 |
ROW_NUMBER
A | 5000 | 1 |
B | 3000 | 2 |
C | 3000 | 3 |
D | 2000 | 4 |
iii) Lead and Lag Functions: LEAD, LAG
Data Lemur: Tweets' Rolling Averages
SELECT
user_id,
tweet_date,
ROUND(AVG(tweet_count) OVER (
PARTITION BY user_id
ORDER BY tweet_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
,2) AS rolling_avg_3d
FROM tweets;
Tool #5: Get sum/count with subquery and use in WHERE
WITH cte AS (
SELECT customer_id, COUNT(DISTINCT product_category) AS category_count
FROM customer_contracts JOIN products
ON customer_contracts.product_id = products.product_id
GROUP BY customer_id
)
SELECT customer_id FROM cte
WHERE category_count = (SELECT COUNT(DISTINCT product_category) FROM products)
Concepts
Concept #1: Order of Execution
i) FROM
ii) WHERE
iii) GROUP BY
iv) HAVING
v) SELECT / PARTITION BY
vi) ORDER BY
Wrong:
SELECT user_id, spend, transaction_date, ROW_NUMBER() OVER(PARTITION BY(user_id) ORDER BY transaction_date) AS counter
FROM transactions
WHERE counter = 3
We will get the error "column counter does not exist". This is because WHERE is executed first before PARTITION BY.
Correct:
WITH cte AS(
SELECT user_id, spend, transaction_date,
ROW_NUMBER() OVER(PARTITION BY(user_id) ORDER BY transaction_date) AS counter
FROM transactions)
SELECT user_id, spend, transaction_date
FROM cte
WHERE counter = 3;