Thursday, May 03, 2012

Know your user cohorts

One of the most important tools to better understand the usage of a web application – or a service, a game or a mobile app, it doesn't matter – is a cohort analysis. In fact, it's almost impossible to get a really good understanding of a service's usage without looking at activity and retention numbers on a cohort-by-cohort basis.

And yet, most startups that we're talking to haven't looked into cohort analyses yet. Often the reason is lack of resources. If you're a young, bootstrapped startup and you have to decide if you want to use your developers' scarce time to improve your product or to get better statistics most founders will decide for the product. That's understandable. Nonetheless I would like to argue for a high quality standard of metrics early on, since the insights that you'll get by understanding your metrics will often be highly actionable. And of course it will make your conversations with investors who want to understand your numbers much easier. At the minimum, I think you should try to make sure from the beginning that you collect the data that will allow you to do more sophisticated analyses later.

Back to the original point, why is a cohort analysis so crucial? Let's take a look at the following chart of an imaginary startup:

Looks like the company is growing nicely, hm? No exponential growth, but constant, linear growth. Now take a look at this chart:

It looks like the number of active users is growing even steeper. Great! 

But now let's take a look at the underlying cohort numbers in this Google Sheet.

The number of new signups are contained in cells D5 to D14, and the cumulated number of signups are in cells E5 to E14 (I used that one to make the chart look better :-) ). The number of active users, which the second chart shows, is contained in cells H15 to Q15.

In case you're not familiar with cohort analyses, here's a quick introduction:
  • Each row represents a signup cohort.
  • In the "right-aligned" cohort analysis at the top you can see the number of active users of each signup cohort for every calendar month. So, for example, I5 is the number of users who signed up in January 2011 and were active in February 2011, and I6 is the number of users who signed up in February 2011 and were active in February 2011. Accordingly, if you go down to the "Total" numbers in row 15 you'll see the total number of active users for each calendar month. These are the numbers which form the activity chart above.
  • In the "left-aligned" cohort analysis at the bottom you can see the number of active users of each signup cohort for every user lifetime month. Example: I20 shows the number of users who signed up in February 2011 and were active in March 2011 (=user lifetime month #2 of the February 2011 cohort).
Row 29 and 30 calculate the monthly drop-off rate and the percentage of users who is still active n months after signing up. Here's where it gets really interesting. Our imaginary startup has a monthly drop-off rate of 50%, which means that after 6 months only 4% of the users are still active! That's not easy to see if you're just looking at the charts above, is it?

Note: In the example that I'm using, a user who registers in month x qualifies as an active user in that month. The assumption is that he logs in at least once after registration and that that log-in makes him count as an active user. That effect completely distorts the real activity numbers. If you're signing up a growing number of users it means that your activity numbers can basically only go up regardless of any real usage activity. So - if you're talking about "active users" it's best to leave out the users who have signed up in the timeframe that you're talking about. That is, if you're talking about the number of active users from last week, include only the users who signed up until the week before.

By the way, while I've used "activity" in this example you can of course use cohort analyses to track other aspects, too. As a SaaS company, for example, you should have a cohort analysis for retention/churn. As an online shop, you should have a cohort analysis for repeat purchases.


Joseph Fung said...

Completely agree with this. Our most useful insights come from looking at signup and specific feature usage in cohorts.

A small nuance, though, for b2b saas startups, though, is you may need to look at *customer* cohorts (whe the company buys) and *user* cohorts within businesses - the same kind of differences show up at the customer level as at the user level within customers (the latter often being an early indicator of issues in the former).

Christoph Janz said...

Thank you for your comment, Joseph. Completely agree that you should look at customer cohorts as a B2B SaaS startup.

And a very interesting point regarding user cohorts within businesses!

Paul Grau said...

Funny coincidence! I was just in the process of calculating cohort analysis when you tweeted the link to this article – though I didn't know the name of it ;-) Thanks, really helpful!

Unknown said...

Helpful as always!

Lisa LaMagna said...

Excellent post, thank you, esp for illustration with spreadsheet. A customer is not a customer is not a customer, this helps make the case for gathering data early on, as you suggest. Thanks.

Anonymous said...

Great post. In addition to using cohorts for signups have you also looked at using cohorts for tracking retention/churn? I have blogged about it here: and recommended that you look at different type of cohorts when measuring churn as well such as:

Traditional way: create cohorts based on the week or month in which they signed up for the service. This will allow you to analyze the effect of changes you made to your product or service over time.

Time based cohorts: to create cohorts based on the “time to cancel” (or the “time to convert” for that matter). This will allow you to focus on long-time users of your product and sift out those who signed up in error.

The customer engagement way: to create cohorts based on the “engagement level” with the product or service. Compare churn from frequent users to churn from casual users or look at churn from users who use a particular feature set versus churn from those who don't.