An introductory guide to identifying users in PostHog
Nov 21, 2022
To understand your product’s usage, you must know who did what. Many of the most valuable insights require an accurate understanding of the user using your product. To make sure user data and events are as accurate as possible, it is critical to identify users properly.
PostHog relies on your implementation of identification to connect event data to specific users. We require events to have a related user ID (even if it is anonymous). It is users who create events after all.
This tutorial goes over the different ways to identify users and recommendations on how to do it better.
Automatic anonymous IDs
If you haven’t set up any identification and are using the PostHog snippet or posthog-js
library, events are captured with an anonymous user ID. We create the anonymous ID using the user’s device ID, and it is automatically tied to all the events a user sends in that session (and future sessions if the cookie is set). In the Live events tab of PostHog, anonymous ID user events look like this:
Anonymous IDs are a basic way to understand the user behind events. They lack preciseness because there is no way of ensuring consistency between sessions (it depends on cookies, which users often block) and lack the depth of person properties or groups. Capturing identified events creates more accurate user data.
JavaScript identify
If you installed posthog-js
, you can use it to better identify users. Calling posthog.identify()
with a distinct user ID connects all events from that user with that ID. You can choose what distinct user ID you want to identify users with. Email is the most popular (it’s what we use), but it could be anything from a username to a random string you generate.
This JavaScript identify
call links the anonymous user ID PostHog generates with the new ID you choose. This means all of the events a user generates when they weren’t identified connect to their new ID (if they happen with the “buffer,” which we explain in our Identify docs).
For example, if a user browses your marketing site, then signs up and you call posthog.identify()
on signup, their events from browsing the marketing site can connect with the new user ID.
Be sure to call
posthog.reset()
on logout (or when users change) to ensure events captures disconnect from the old user and can connect to a new (right) one.
You can also use posthog.identify()
to add properties to the user, such as their signup source, plan, or website link. You can see more details on the JavaScript posthog.identify()
call in the docs.
Identifying and setting user IDs for every other library
Every library that isn’t JavaScript (such as Python, Go, PHP, and more) requires you to pass the distinct user ID on every posthog.capture()
call.
Ideally, the user is authorized to make requests to the backend, and you can use a unique ID used in the authorization. Like the JavaScript library, email is the preferred choice for many, but a username or another type of ID also work.
If you don’t have a unique ID like an email, you can always generate a UUID or use some other piece of information (like a device or request ID). Ideally, try to find a way to connect these IDs across sessions. Some ideas:
- Linking whatever ID you choose to an API Key or authorization method if they are using an API
- Storing the ID on the frontend and passing the values to the backend when they make a request
- Use an ID based on the resources they are accessing on the backend
Note:
posthog.identify()
works differently in non-JavaScript libraries. It only updates the user’s properties, and won’t connect the user you identify with future events captured. Check out your library’s docs for more details (here’s Python for example).
The importance of setting accurate distinct user IDs
The goal of setting distinct user IDs is accurately representing unique users and their behavior. Having multiple IDs for the same user will cause insights such as unique users, active users (daily, weekly), funnels, and more to be inaccurate.
Here’s a ranking of user identification options (you want to aim for #1):
- same ID across every session, such as well-configured Javascript
identify
or other library’s eventcapture
call - same ID across many sessions, such as automatic anonymous IDs (ideally)
- same ID for single sessions, such as automatic anonymous IDs (with cookies blocked)
- new ID for every request, such as poorly configured event
capture
calls
You want to work up this list because it creates more accurate user stats. For example, even if you want your users to be anonymous, better identifying them across a single session (rather than every request) provides more accurate stats. This creates better insights to build a better product, which we want to enable you to do.
Further reading
- See our docs on identifying users for more details
- Understanding group analytics: frontend vs backend implementations
Comments
Hello folks!
A part of our app has anonymous features. We're trying to figure out how to link the events captured in our frontend with those captured in anonymous requests sent to our backend.
The most obvious idea that we have in mind would be to send the anonymous ID automatically generated by PostHog on the frontend to our backend with every request. We can't seem to find how to get that from the posthog-js SDK though. Are we missing something?
Hi Louis,
You can call
posthog.get_distinct_id()
on the frontend to get the ID to send to the backend.
Hello,
We are experiencing an issue with the management of anonymous IDs by the PostHog analytics library. Our setup involves storing IDs in cookies (
persistence: "cookie"
). Here's the behavior we're observing:-
Initial Landing: When anonymous users first visit our login page, the anonymous ID is equal to the $device_id value in the cookies. However, this $device_id is different from the distinct_id value also stored in the cookies.
-
Subsequent Events: After the initial pageview, when events are sent, the anonymous ID aligns with the distinct_id.
-
User Registration: When a user eventually signs up, the first event where the $device_id was used is not reconciled with the new registered user's events. As a result, the initial pageview event remains disconnected from the user’s subsequent activity, despite their registration.
We need to ensure that the first pageview is correctly associated with the user once they register. Does anyone have suggestions on how to address this inconsistency between $device_id and distinct_id?
Additionally, the distinct_id value is regenerated with every page refresh, while the $device_id remains constant.
In my view, the most logical approach would be to track all anonymous events using $device_id. Then, as soon as we can call identify()—which happens right after email confirmation during the sign-up process—all events should be retroactively reconciled to this ID.
Yet, somehow, after the first pageview, distinct_id is used to track the anonymous person instead of $device_id
- Iana year ago
Hi Edoardo, are you intentionally regenerating the
distinct_id
? It should be the same on page refreshes by default. Two potential fixes for you:- Use
persistence: localstorage+cookie
. This saves room in the cookie, preventing some issues there. - Use a reverse proxy to ensure PostHog isn't being blocked.
- Use
-
Hello,
We are experiencing an issue with the management of anonymous IDs by the PostHog analytics library. Our setup involves storing IDs in cookies (
persistence: "cookie"
). Here's the behavior we're observing:-
Initial Landing: When anonymous users first visit our login page, the anonymous ID is equal to the $device_id value in the cookies. However, this $device_id is different from the distinct_id value also stored in the cookies.
-
Subsequent Events: After the initial pageview, when events are sent, the anonymous ID aligns with the distinct_id.
-
User Registration: When a user eventually signs up, the first event where the $device_id was used is not reconciled with the new registered user's events. As a result, the initial pageview event remains disconnected from the user’s subsequent activity, despite their registration.
We need to ensure that the first pageview is correctly associated with the user once they register. Does anyone have suggestions on how to address this inconsistency between $device_id and distinct_id?
-
Is device_id unique for one device? Or it is a unique combination of device and browser?
I visited website as a guest from two different browsers on one device, and got different $device_id property (and distinct_id too), so I assume that $device_id isn't an identifier of device used, am I right?
Identify doesn't associate previous events with the new ID
Hi folks!
I’m using posthog-js autocapture with Nextjs. When I’m calling .identify after user signin the previous events associated with the anonymous ID are not linked with the id of the existing user that I use in the .identify.
Events after identify are ok, but every time I signout (using .reset) there is a new anonymous user that never gets associated with the signed in user.
Do you have any ideas what could be the problem?
- Paul(he/him)2 years ago
Hey Tomas,
Coincidentally I've been exploring this lately. You can see more here https://github.com/PostHog/posthog-js/issues/512
The short answer is if you call
reset(true)
then those anonymous events will be linked to the user when you subsequently call identify.