Directory-Derived Data vs. Event-Derived Data

When working with Customer Insights, it's important to understand that, for reasons both logistical and historical, Customer Insights actually uses two different data sources. In its initial release, Customer Insights derived data directly from Identity Cloud events (registrations, sign-ins, deactivations, etc.). This approach resulted in data that was very accurate, but not perfectly accurate. For example, if you counted the number of user profiles in Customer Insights and then compared that value with the number of user profiles reported in, say, the Capture Dashboard, the two values were close, but often-times were not identical:

Needless to say, 1,001,007 is not the same as 1,001,877.

Because of that, backend changes were engineering to make sure that Customer Insights no longer derives demographic information by parsing event data (well, OK: more on that in a minute). Instead, the application directly queries your user profile store (or at least a mirrored replica of that store) any time you ask it to retrieve user information. This results in improved accuracy, in fact, the number of user profiles reported in Customer Insights now exactly matches the number of user profiles reported elsewhere (which is not too surprising, seeing as how the numbers are being pulled from the same datastore):

As notedd, the improved accuracy is due to the way Customer Insights retrieves demographics data: data is now queried directly from the Identity Cloud directory and is not derived from event data. Or, to be a little more accurate ourselves, not all of demographic information is derived from event data. And yes, a little explanation is in order here.

To begin with, and despitte what we might have implied, event-derived data is still available in Customer Insights, and is still updated in the same way. Furthermore, that data remains extremely accurate, albeit not perfectly accurate. But it’s still available for you to use.

And that’s a good question: if there’s a newer, better way to retrieve demographic data, then why keep the old method? As it turns out, there are several reasons for that. For one thing, there’s the all-important issue of backward compatibility: if event-derived data was discontinued, many previously-created Looks and Dashboards would no longer work. To keep those reports functioning (and to keep them useful), that data needs to keep coming in.

There’s also issues involving data collection history. To be honest, the new directory-derived data has very little history: after all, directory-derived data wasn’t introduced until December 2018. By comparison, event-derived data dates back several years before that. Tossing out event-derived data would mean tossing out a large amount of historical data, making it difficult to chart trends, make projections, and do all those other things that rely on having a dataset that’s been around for awhile.

Just keep in mind that, while not perfect, event-derived data is close to perfect. True, you can’t use the event-derived data to say something like this: “On March 3, 2018, we had exactly 987,353 registered users.” However, you can use the event-derived data to chart your growth rate in registrations over the past year, or to see whether sign-in spikes coincided with your marketing campaigns. Event-derived data has its purposes, and will continue to have those purposes for quite some time to come. T

The move to a new method for querying data has also led to a couple of important infrastructure changes . For one thing, you might note that some of the default Dashboards – such as Demographic Trends and Last Login, Creation, and Deactivation Trends– feature the word Trends in their name. There’s a reason for that: because these Dashboards use event-derived data, that means that the data is not necessarily 100% accurate. That also means that this data – and these Dashboards – are best used for tracking trends. In fact, each of these Dashboards includes a disclaimer to that effect: