Thursday, May 03, 2012

Know your user cohorts

One of the most important tools to better understand the usage of a web application – or a service, a game or a mobile app, it doesn't matter – is a cohort analysis. In fact, it's almost impossible to get a really good understanding of a service's usage without looking at activity and retention numbers on a cohort-by-cohort basis.

And yet, most startups that we're talking to haven't looked into cohort analyses yet. Often the reason is lack of resources. If you're a young, bootstrapped startup and you have to decide if you want to use your developers' scarce time to improve your product or to get better statistics most founders will decide for the product. That's understandable. Nonetheless I would like to argue for a high quality standard of metrics early on, since the insights that you'll get by understanding your metrics will often be highly actionable. And of course it will make your conversations with investors who want to understand your numbers much easier. At the minimum, I think you should try to make sure from the beginning that you collect the data that will allow you to do more sophisticated analyses later.

Back to the original point, why is a cohort analysis so crucial? Let's take a look at the following chart of an imaginary startup:

Looks like the company is growing nicely, hm? No exponential growth, but constant, linear growth. Now take a look at this chart:

It looks like the number of active users is growing even steeper. Great! 

But now let's take a look at the underlying cohort numbers in this Google Sheet.

The number of new signups are contained in cells D5 to D14, and the cumulated number of signups are in cells E5 to E14 (I used that one to make the chart look better :-) ). The number of active users, which the second chart shows, is contained in cells H15 to Q15.

In case you're not familiar with cohort analyses, here's a quick introduction:
  • Each row represents a signup cohort.
  • In the "right-aligned" cohort analysis at the top you can see the number of active users of each signup cohort for every calendar month. So, for example, I5 is the number of users who signed up in January 2011 and were active in February 2011, and I6 is the number of users who signed up in February 2011 and were active in February 2011. Accordingly, if you go down to the "Total" numbers in row 15 you'll see the total number of active users for each calendar month. These are the numbers which form the activity chart above.
  • In the "left-aligned" cohort analysis at the bottom you can see the number of active users of each signup cohort for every user lifetime month. Example: I20 shows the number of users who signed up in February 2011 and were active in March 2011 (=user lifetime month #2 of the February 2011 cohort).
Row 29 and 30 calculate the monthly drop-off rate and the percentage of users who is still active n months after signing up. Here's where it gets really interesting. Our imaginary startup has a monthly drop-off rate of 50%, which means that after 6 months only 4% of the users are still active! That's not easy to see if you're just looking at the charts above, is it?

Note: In the example that I'm using, a user who registers in month x qualifies as an active user in that month. The assumption is that he logs in at least once after registration and that that log-in makes him count as an active user. That effect completely distorts the real activity numbers. If you're signing up a growing number of users it means that your activity numbers can basically only go up regardless of any real usage activity. So - if you're talking about "active users" it's best to leave out the users who have signed up in the timeframe that you're talking about. That is, if you're talking about the number of active users from last week, include only the users who signed up until the week before.

By the way, while I've used "activity" in this example you can of course use cohort analyses to track other aspects, too. As a SaaS company, for example, you should have a cohort analysis for retention/churn. As an online shop, you should have a cohort analysis for repeat purchases.