How to create cohort analysis in Google Sheets? Part 1
Cohort analysis is a type of analysis when we split the users by groups and observe the behavior of these groups based on certain characteristics.
In this article we will start doing all the settings and calculations for cohort analysis in Google Sheets.
I’ve recorded a video tutorial, check it out:
Let’s take a look at the final table in Looker Studio.
As we can see, here we’ve split the users based on a first purchase month. Subsequently, we observe acquisition and retention.
Analyzing the pivot table, we can say that in January 2022 we had 257 new customers. The same month we got 21k $ from these customers 21k$ and each customer has brought us 220$ during all the time. Also this table helps us to see how many months customers or a group of customers were staying with us.
This type of analysis is pretty useful for the fields of games, e-commerce and SAAS products.
But how can we set everything up and create such a cohort analysis?
Input data
In Google Sheets that I’ve connected to this report the first page is an informational page for my clients and people who’ve bought the template. It contains info about how to multiply the dashboard.
If you start to create this dashboard from scratch, you should start from the Input tab. What does it need to have?
- Revenue data. Basically, you need to pull this data from your ecommerce store or ads platform. The table with revenue data needs to have Date, Order ID, User ID and Revenue columns.
- Spend by month data. In my report I need data about spends to calculate customer acquisition cost. Though, it is not necessary for cohort dashboards.
Let’s move to the next tab – Cohort work LTV. Actually, you can do the calculations on the same tab, but I found it more understandable and easier to do it on two different ones. I want to show you some formulas that I’ve used in order to write some custom formulas.
Formulas and calculations
Hence, you need to copy the Revenue and Spend tables to this tab. To do it, I used formulas that input the data we need from another tab. It looks like this. ={‘0_input’!A3:D}
Next column I need to create is going to be Month-Year, but in your case it can be anything else, like weeks or time spent in the game/app.To do it, I need to extract months from date. How can I do it?
Date formula
The whole formula looks like this:
=ARRAYFORMULA(if(A2:A=””,””, date(year(A2:A),month(A2:A),1)))
It has a few components, so let’s start with date(year(A2:A),month(A2:A),1). Generally, this formula can be explained in this way: we extract the year and the month from the cells in the range A2:A, but as this formula also gives us a chance to show a day, so let it be the first day of the month. If you want to know more ways to extract week from date, check out my article!
As I don’t know how many rows I will have, because it is cohort analysis and I may update the data weekly using API, it’s a great idea to use ARRAYFORMULA.We can just write the formula like this and it will already work:
=ARRAYFORMULA(date(year(A2:A),month(A2:A),1)
But if there is no data in the end of the range, there still will be dates but really strange ones. So to avoid wrong data in the table, let’s add if(A2:A=””,””, which basically means that if there is an empty cell in the range A2:A, leave the cell in the column empty. By the way, I have an article about issue fixing. Check it out.
MINIFS and MAXIFS
To split our users by cohorts we need to find the first sale date. To do this I use the formula MINIFS($A:$A, $C:$C,$C2). This formula helps us find the minimum from column A, if column C is equal to C2. In other words, we find the minimum date for each user. And let’s add if(A2:A=””,”” at the beginning of the formula so that it looks this way: if(A2:A=””,””MINIFS($A:$A, $C:$C,$C2)) .
To find the last sale date we use the same logic and almost the same formula, but in this case we are not looking for minimums from column A, we are looking for the maximum. So the final formula looks like this:if(A2:A=””,””,MAXIFS($A:$A, $C:$C,$C2))
Also we need a formula to find the month of the last sale. The formula looks like this:
=ARRAYFORMULA(if(A2:A=””,””, date(year(H2:H),month(H2:H),1)))
DATEDIF
The next thing we need to calculate is the current sale month, in other words it is the difference between Month-Year and First Sale month. To do it I use the formula DATEDIF.
=ARRAYFORMULA(if(A2:A=””,””, DATEDIF(G2:G,E2:E,”M”)))
Surely everything is clear with the first part of the formula, so let’s see what is going on with DATEDIF. We’ve input the start date, the end date and the unit that we need to extract, in our case it’s a month, so we write M.
Using the same formula we count the Max Month, but here we find the difference between the first sale month and last sale month. The final formula is the following:
=ARRAYFORMULA(if(A2:A=””,””, DATEDIF(G2:G,I2:I,”M”)))
COUNTIF
To count the number of purchases we need to use the formula COUNTIF. Using this formula we can determine how many times a certain Order ID appears in the column.
The final formula looks like this:
=ARRAYFORMULA( if(A2:A=””,””,COUNTIF($C:$C,C2:C)))
Using this formula we can see how many times the cell C2 appears in the date range.
Monthly revenue
After all the formulas there are columns with months. We can see what month and how much did the customer bring us.
So, in each of the month number columns we write the following formula:
=if($A2=””,””, if(N$1=$J2,$D2,””))
It means that if the number of this column (N1) is the same as the current sale month, then use Revenue. If it’s not, leave it blank.
Consequently, the formulas for the following column will be changed in accordance to the number of the column. Like if the number of this column (O1) is the same as the current sale month, then use Revenue etc.
In the next article we will observe the rest of the tabs, including pivot tables etc.
If you don’t want to spend time creating everything manually, you can buy a Cohort Analysis report template in my template store!
Hope you enjoyed the article! Subscribe to the newsletter not to miss the second part!