Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Captains Cup’, ‘Jenny LeClue – Detectivu’ and More – TouchArcade

    May 21, 2026

    Does Queen Maeve Come Back for ‘The Boys’ Finale?

    May 21, 2026

    2026 Mazda BT-50 SP review

    May 21, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»SQL Window Functions Beyond Basics: Solving Real Business Problems
    SQL Window Functions Beyond Basics: Solving Real Business Problems
    Business & Startups

    SQL Window Functions Beyond Basics: Solving Real Business Problems

    gvfx00@gmail.comBy gvfx00@gmail.comMay 21, 2026No Comments10 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    SQL Window Functions
     

    Table of Contents

    Toggle
    • # Introduction
    • # Running Totals
        • // Example: Calculating Revenue Over Time
    • # Gaps and Islands (Sessionization)
        • // Example: Finding User Streaks
    • # Cohort Analysis
        • // Example: Counting First-Time Orders
    • # Percentile and Ranking Analysis
        • // Example: Identifying Top Percentile Fraud
    • # Conclusion
      • Related posts:
    • Gemini 2.5 Computer Use: Google's FREE Browser Use AI Agent! 
    • A 6-Month Guide to Mastering AI Agents
    • Top 10+ Free Machine Learning And Artificial Intelligence Courses In 2024

    # Introduction

     
    Most of you use SQL window functions, but you’re only scratching the surface — a ROW_NUMBER() here, a SUM() OVER() there. The window functions’ real potential is revealed when you apply them to harder problems. I will walk you through four patterns that show window functions at their most useful.

     
    SQL Window Functions
     

    The examples are all real interview questions you can practice on StrataScratch.

     

    # Running Totals

     
    Calculating running totals is one of the most common business uses of window functions. The finance people absolutely love it! It is used to track cumulative monthly revenue, which then easily moves into calculating where you’re at compared to the annual revenue target.

     
    SQL Window Functions
     

    What makes this a window function problem is that, typically, you should include both the per-period value and the accumulating total in the same output. You can’t use GROUP BY with SUM(), because that collapses individual rows. So, the obvious solution is using a window function, i.e., SUM() OVER().

     

    // Example: Calculating Revenue Over Time

    This Amazon question originally asks you to calculate the 3-month rolling average. However, we’ll disregard that and calculate the cumulative revenue for each month.

    Data: Here’s the amazon_purchases table preview.

     

    user_id created_at purchase_amt
    10 2020-01-01 3742
    11 2020-01-04 1290
    12 2020-01-07 4249
    … … …
    109 2020-10-24 1749

     

    Code: The inner query turns dates into YYYY-MM format using TO_CHAR() and aggregates monthly revenue, filtering out returns with WHERE purchase_amt > 0.

    The outer query applies the window function over those monthly totals we calculated. I don’t specify an explicit frame clause (intentionally) in OVER(), so the window function defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That means the window is all rows preceding the current row, i.e., the month. In other words, the cumulative sum is: all previous months + the current month. Not surprisingly, that is a textbook definition of a cumulative sum.

    SELECT t.month,
           t.monthly_revenue,
           SUM(t.monthly_revenue) OVER(ORDER BY t.month) AS cumulative_revenue
    FROM (
        SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,
            SUM(purchase_amt) AS monthly_revenue
        FROM amazon_purchases
        WHERE purchase_amt > 0
        GROUP BY TO_CHAR(created_at::date, 'YYYY-MM')
        ORDER BY TO_CHAR(created_at::date, 'YYYY-MM')
    ) t
    ORDER BY t.month ASC;

     

    Output:

     

    month monthly_revenue cumulative_revenue
    2020-01 26292 26292
    2020-02 20695 46987
    2020-03 29620 76607
    … … …
    2020-10 15310 239869

     

    # Gaps and Islands (Sessionization)

     
    This pattern, too, involves sequential data, just like running totals, but it employs different window functions.

    An island is a run of rows with the same condition, e.g., consecutive daily logins. A gap is the space between islands.

    One of the most common real-world applications of this pattern is sessionization — grouping a raw event stream into sessions. A session is typically defined as a sequence of events from the same user where no gap between consecutive events exceeds some timeout (30 minutes is the web analytics standard).

    Sessionization is commonly applied in product and data engineering. It is used anywhere you need to group raw event streams into meaningful units of activity.

     
    SQL Window Functions
     

    The classic detection in SQL consists of two steps:

    • LAG() or LEAD() — to compare each row to the one before or after it, and flag where a new streak starts.
    • SUM(flag) OVER (PARTITION BY user ORDER BY date) — to accumulate flags into a streak ID, as it stays flat inside a streak and increments at every boundary.

     

    // Example: Finding User Streaks

    The question from LinkedIn and Meta interviews asks you to find the top three users with the longest platform visit streak until August 10, 2022. You should output all users with the top three longest streaks, if there is more than one user per streak length.

    Data: The table is user_streaks.

     

    user_id date_visited
    u001 2022-08-01
    u001 2022-08-01
    u004 2022-08-01
    … …
    u005 2022-08-11

     

    Code: The query is long, but it’s neatly structured into CTEs, so it’s easy to follow.

    1. unique_visits: Removes duplicate visit records and caps the data at August 10, 2022.
    2. streak_flags: Uses LAG() to get the previous visit date per user and flags the row as 0 (a streak continuation if the gap is 1 day) or 1 (a new streak start for any other gap).
    3. streak_ids: Converts flags into streak group IDs using a cumulative SUM().
    4. streak_lengths: Counts days per streak.
    5. longest_per_user: Keeps only each user’s longest streak.
    6. ranked_lengths: Ranks distinct streak lengths.
    7. top_lengths: Finds the top 3 streak-length values.

    The final SELECT ties everything together: it shows all users with the top three streaks and their respective streak lengths in days.

    WITH unique_visits AS (
        SELECT DISTINCT user_id, date_visited
        FROM   user_streaks
        WHERE  date_visited <= DATE '2022-08-10'),
    streak_flags AS (
        SELECT *,
               CASE
                   WHEN date_visited
                         - LAG(date_visited) OVER (PARTITION BY user_id ORDER BY date_visited) = 1
                   THEN 0
                   ELSE 1
               END AS new_streak
        FROM   unique_visits),
    streak_ids AS (
        SELECT *,
               SUM(new_streak) OVER (PARTITION BY user_id ORDER BY date_visited) AS streak_id
        FROM   streak_flags),
    streak_lengths AS (
        SELECT user_id,
               streak_id,
               COUNT(*) AS streak_length
        FROM   streak_ids
        GROUP  BY user_id, streak_id),
    longest_per_user AS (
        SELECT user_id,
               MAX(streak_length) AS streak_length
        FROM   streak_lengths
        GROUP  BY user_id),
    ranked_lengths AS (
        SELECT DISTINCT
               streak_length,
               DENSE_RANK() OVER (ORDER BY streak_length DESC) AS len_rank
        FROM   longest_per_user),
    top_lengths AS (
        SELECT streak_length
        FROM   ranked_lengths
        WHERE  len_rank <= 3)
    SELECT u.user_id,
           u.streak_length
    FROM   longest_per_user u
    JOIN   top_lengths       t USING (streak_length)
    ORDER  BY u.streak_length DESC, u.user_id;

     

    Output:

     

    user_id streak_length
    u004 10
    u005 10
    u003 5
    u001 4
    u006 4

     

    # Cohort Analysis

     
    A cohort is a group of users who share a starting event, for example, a first purchase, first login, or first subscription date. Analyzing cohorts is the foundation of retention reporting, as it answers the question of how many users came back after the starting event.

     
    SQL Window Functions
     

    The key thing in cohort analysis is finding the cohort anchor in the user’s activity history so that you can measure all subsequent activity against it.

    Doing that in SQL boils down to three main window function approaches:

    1. MIN(event_time) OVER (PARTITION BY user_id) — the most common pattern when the anchor is a date.
    2. FIRST_VALUE(attribute) OVER (PARTITION BY user_id ORDER BY event_time) — used when you need the anchor value itself, e.g., the first merchant or first product category.
    3. ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time) = 1 — used when you want to isolate the first event as a separate row and join it back to the full history rather than broadcasting it across all rows.

     

    // Example: Counting First-Time Orders

    Here’s a DoorDash question. It requires you to calculate the number of orders and first-time orders (from a customer’s perspective) each merchant has had. You should also exclude merchants that have not received any orders.

    Data: The first table is named order_details.

     

    id customer_id merchant_id order_timestamp n_items total_amount_earned
    8 1049 6 2022-01-14 01:00:28 5 16.3
    7 1049 5 2022-01-14 11:50:29 4 2.16
    22 1049 1 2022-01-14 22:46:54 8 2.63
    … … … … … …
    39 1060 1 2022-01-16 22:27:30 11 15.41

     

    The second table is merchant_details.

     

    id name category zipcode
    1 Treehouse Pizza american 92507
    2 Thai Lion asian 90017
    3 Meal Raven fast food 95204
    … … … …
    7 Taste Of Gyros mediterranean 94789

     

    Code: The first CTE is where the cohort logic happens. I use the FIRST_VALUE() window function to attach the merchant from each customer’s earliest order to every row in their order history. The result is a table where every order carries the label of which merchant that customer started with.

    In the second CTE, I join the labels back to the full order history using a LEFT JOIN to ensure that merchants who received orders but were never anyone’s first merchant still appear in the result. We use COUNT() and DISTINCT to count only the customers for whom that merchant was their first — that’s your cohort size. With another COUNT(), you get the total number of orders. DISTINCT is required here, too, because the LEFT JOIN with first_order can produce duplicate order rows — since first_order retains one row per order (not per customer), a single order in order_details can match multiple rows in first_order for the same customer, inflating the count without it.

    In the final SELECT, we join the number_of_customers CTE with merchant_details to bring in the merchant names.

    WITH first_order AS  (
    SELECT customer_id,
           FIRST_VALUE(merchant_id) OVER(PARTITION BY customer_id ORDER BY order_timestamp) AS first_merchant
    FROM order_details),
    number_of_customers AS  (
    SELECT merchant_id,
           COUNT(DISTINCT f.customer_id) AS first_time_orders,
           COUNT(DISTINCT id) AS total_number_of_orders
    FROM order_details d
    LEFT JOIN first_order f ON d.merchant_id = f.first_merchant
    GROUP BY 1)
    SELECT name,
           total_number_of_orders,
           first_time_orders
    FROM number_of_customers
    JOIN merchant_details ON number_of_customers.merchant_id = merchant_details.id;

     

    Output:

     

    name total_number_of_orders first_time_orders
    Treehouse Pizza 8 1
    Thai Lion 14 7
    Meal Raven 12 0
    Burger A1 4 0
    Sushi Bay 7 3
    Tacos You 7 1

     

    # Percentile and Ranking Analysis

     
    Aggregate functions tell you the average. Window-based ranking functions tell you the distribution, and distributions are where the interesting business questions live. Is your 90th percentile order value unusually high, suggesting a few large buyers are skewing revenue? Are the bottom 25% of sales reps clustered close to the median or far below?

    NTILE(n) divides rows into n roughly equal buckets. PERCENT_RANK() expresses each row’s rank as a value between 0 and 1. CUME_DIST() tells you what fraction of rows have a value less than or equal to the current row. And PERCENTILE_CONT() computes the actual value at a given percentile threshold — useful when you want to filter based on a dynamic cutoff rather than rank within a result set.

     
    SQL Window Functions
     

    // Example: Identifying Top Percentile Fraud

    Here’s one by Google and Netflix. They want you to identify the most suspicious claims in each state. The assumption is that the top 5% of claims in each state are potentially fraudulent.

    Data: The table is named fraud_score.

     

    policy_num state claim_cost fraud_score
    ABCD1001 CA 4113 0.61
    ABCD1002 CA 3946 0.16
    ABCD1003 CA 4335 0.01
    … … … …
    ABCD1400 TX 3922 0.59

     

    Code: In the code, PERCENTILE_CONT(0.95) computes the interpolated value at the 95th percentile of fraud scores within each state.

    In the following SELECT statement, the CTE is joined with the original table so every claim can be compared against the threshold for its own state. Claims at or above that value make the cut.

    WITH state_percentiles AS (
        SELECT state,
               PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY fraud_score) AS p95
        FROM fraud_score
        GROUP BY state)
    SELECT f.policy_num,
           f.state,
           f.claim_cost,
           f.fraud_score
    FROM fraud_score f
    JOIN state_percentiles sp
    ON f.state = sp.state
    WHERE f.fraud_score >= sp.p95;

     

    Output:

     

    policy_num state claim_cost fraud_score
    ABCD1016 CA 1639 0.96
    ABCD1021 CA 4898 0.95
    ABCD1027 CA 2663 0.99
    … … … …
    ABCD1398 TX 3191 0.98

     

    # Conclusion

     
    These four patterns share a common philosophy: do the work in the database, in a single pass where possible, using the full expressive power of the SQL window specification.

    What makes window functions genuinely powerful isn’t any single function in isolation. It’s the composability: you can chain CTEs, apply multiple window functions in the same SELECT, and build complex analytical logic that reads nearly like a description of the business problem itself.
     
     

    Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.



    Related posts:

    Are AI Agents Your Next Security Nightmare?

    Top 46 AI Tools in 2026 You Must Use

    Integrating Rust and Python for Data Science

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGoogle publishes exploit code threatening millions of Chromium users
    Next Article ‘King’ Emery hailed as Aston Villa beat Freiburg 3-0 in Europa League | Football News
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    Best Small Language Models on Hugging Face Right Now!

    May 21, 2026
    Business & Startups

    Top 9 AI Events and Conferences in 2026 that you Must Attend

    May 21, 2026
    Business & Startups

    Anonymizing Production Data for Data Science with Mimesis

    May 20, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025162 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202599 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202582 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025162 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202599 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202582 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.