In this article I will show you how to build daily user growth report with cumulative number (running total) using mostly SQL, and filter capabilities of Holistics platform.
Requirements: We want to build a report that:
- Show daily users and a cumulative user growth
- Ability to select a date range
- Ability to filter by some user’s conditions (gender, user type, etc)
First, The Dataset & A Simple Daily Count
Assume we work with this simple users table in PosgresSQL (fields like username, email are omitted):
marketplace.users
- id (pk)
- gender (varchar)
- sign_up_date (date)
Let’s start by building a simple daily count report with visualization like below. Report viewers have ability to select a date period to see only data for that range.
Here we use Holistics’ date filters feature to add 2 date pickers (start date and end date), the user’s inputs are
then turned into variables {{start_date}}
and {{end_date}}
and inject into the query.
SELECT
sign_up_date as "Date",
count(1) as "Daily Users"
FROM marketplace.users U
WHERE sign_up_date >= {{start_date}}
AND sign_up_date <= {{end_date}}
GROUP BY 1
Add Cumulative Column
That was simple, now, let’s add a new cumulative column tell people how many total users do we have up to a particular day.
We do this by using PostgreSQL’s famous Window Functions capability:
SELECT
sign_up_date as "Date",
COUNT(1) as "Daily Users",
SUM(COUNT(1) OVER (ORDER BY sign_up_date)) AS "Cumulative Users" -- accumulating signups
FROM marketplace.users U
WHERE sign_up_date >= {{start_date}}
AND sign_up_date <= {{end_date}}
GROUP BY 1
Charting this out with dual-axis and dual chart type, we will get a nice chart with both daily number and cumulative number.
Caveat 1: Handle Days With No Data
There’s a problem with the above query: days with no signups are not shown in the result table. This is considered a bug in our query, and needs to be addressed. We do want that particular day to report 0 sign ups, instead of missing out the value completely. Look at the image from step 1, day March 18 is missing from the resultset table.
We fix this by generate a list of all dates between [start_date, end_date]
(using PostgreSQL’s generate_series
function) and use that list as a base to combine with our daily user count data.
Now our SQL query starts to get complicated:
-- date ranges for [start_date, end_date]
WITH date_ranges AS (
SELECT date_d::date as date_d
from GENERATE_SERIES({{start_date}}::DATE, {{end_date}}::DATE, '1 day'::INTERVAL) date_d
),
user_counts AS (
SELECT
sign_up_date as date_d,
count(1) as cnt
FROM marketplace.users U
WHERE sign_up_date >= {{start_date}}
AND sign_up_date <= {{end_date}}
GROUP BY 1
)
SELECT
D.date_d AS "Date",
COALESCE(UC.cnt, 0) as "Daily Users",
SUM(COALESCE(UC.cnt, 0)) OVER (ORDER BY D.date_d)) as "Cumulative Users"
FROM date_ranges D
LEFT JOIN user_counts UC ON UC.date_d = D.date_d
A few notes here:
- We define a
user_counts
CTE to return daily user counts - We define a
date_ranges
CTE to return all dates within a date range. - We move the accumulating sum outside to the main SELECT
Caveat 2: Calculating Prior Total
Another thing looked wrong in the above image: the accumulating sums only sum for the selected date period, instead of taking into account the total user signups from the beginning of time.
To fix this, we need to perform a one-time calculation of the total users number prior to the start_date specified. We do so with a CTE, and then add that to the running total number, like so:
-- CTE to calculate prior total
WITH sums AS (
SELECT COUNT(1) as total_prior
FROM marketplace.users U
WHERE sign_up_date >= {{start_date}}
),
date_ranges AS (
SELECT date_d::date as date_d
from generate_series({{start_date}}::DATE, {{end_date}}::DATE, '1 day'::INTERVAL) date_d
),
user_counts AS (
SELECT
sign_up_date as date_d,
count(1) as cnt
FROM marketplace.users U
WHERE sign_up_date >= {{start_date}}
AND sign_up_date <= {{end_date}}
GROUP BY 1
)
SELECT
D.date_d AS "Date",
COALESCE(UC.cnt, 0) as "Daily Users",
sums.total_prior + (SUM(COALESCE(UC.cnt, 0)) OVER (ORDER BY D.date_d)) as "Cumulative Users"
FROM date_ranges D
LEFT JOIN user_counts UC ON UC.date_d = D.date_d
CROSS JOIN sums
Adding a few filters like Age, Gender, etc and we get our final result, just like the first GIF image.
Summary
As you saw, in this post we shown you how to do a simple daily count + cumulative report using the power and flexibility of SQL. We address the few points:
- Calculate running total with window functions
- Avoid empty-row days with generated date ranges
- Calculate sums prior to selected date range, and add that to the running total You can play around with the real report here.
Try On Your Own Data
If you want to build similar report for your own data, you can do so with Holistics with 3 simple steps:
- Sign Up Free Trial
- Connect To Your SQL Database (instruction within app)
- Build The Report With Above Query