Cohort Retention Analysis is a powerful thing that most business owners need to look at.
Retention: If first-time user A goes to the store on Week 1, and returns to the store the next week, he is a returned user. If user B also goes to the store on Week 1 and does not return the next week, he’s a bounced user, that basically means you lose him as a user (despite all the marketing money you spent to lure him to your store).
Cohort Analysis is a technique to see how variables change in different groups given different starting conditions (a cohort).
Defining Requirements
In this post, we’ll show you how to build the below Cohort Retention:
The above chart tells you a few things:
- We cohort users by their signup month. This is the
users
table below. - For each cohort, we show the population (how many people) in that cohort in the second column
- An activity is an action user did to the site (this could be simply view the page, or as specific as making a purchase).
We denote this as table
activities
below.
We need the following tables and fields:
users:
- id
- timestamp
activities:
- user_id
- timestamp
Bucketing Users Into Cohort
First we bucket them into different cohort by their sign up month, and store into cohort_items
-- (user_id, cohort_month), each
with cohort_items as (
select
date_trunc('month', U.timestamp)::date as cohort_month,
id as user_id
from users U
order by 1, 2
)
After that, we build user_activities
which
-- (user_id, month_number): user X has activity in month number X
WITH user_activities as (
select
A.user_id,
MONTH_DIFF(
date_trunc('month', A.timestamp)::date,
C.cohort_month
) as month_number
from public.activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
)
The above would return all the pairs of (user_id, month_number)
that indicates if a user is active in that month after
their original signup date.
| user | month_number |
| Alex | 0 |
| Alex | 1 |
| Bob | 0 |
| Bob | 2 |
The MONTH_DIFF
is a user-defined function that takes in 2 dates, and return the number of months between them.
Cohort Size: is simply how many users are in each group
-- (cohort_month, size)
with cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
And finally, putting them together with the below:
-- (cohort_month, month_number, cnt)
with retention_table as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
)
-- our final value: (cohort_month, size, month_number, percentage)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users::float * 100 / S.num_users as percentage
from retention_table B
left join cohort_size S ON B.cohort_month = S.cohort_month
order by 1, 3
where B.cohort_month IS NOT NULL
And you would get:
With Holistics, we can automatically turn them into beautiful cohort table like so:
The Full SQL
Here’s the full SQL for your reference. Note: you just need to change the table names and column names inside the first 2 WITH clauses, and the rest should work as per normal. View the full SQL