Flipping the Script on Identity Resolution

Marketers encounter many versions of their customers and prospects. In the course of the day, they might meet the anonymous website visitor, the frustrated phone caller and the indecisive store visitor.

But are these different selves actually the same person?

And how can you assess the accuracy of an identity that combines them?

The answers to these questions determine if you make a sale to the right person and if your advertising dollars are wasted or used effectively.

Many marketers wrestle with two approaches to identity resolution: deterministic and probabilistic. More often than not, both get mischaracterized.

Deterministic matches are paired through a persistent identifier like an email address. If a user employs an email address to log on to a web account, then the cookie in that user’s browser can be connected to that email address. And that cookie, as well as that device, can then be connected to other data points.

In probabilistic matches, inferences drive the connections. Machine learning allows us to identify patterns across a web of customer signals that suggest linkages between an individual and, say, a smartphone and laptop used together across multiple Wi-Fi hotspots. Depending on the strength of those signals, we could infer that those devices belong to a single person.

What’s the identity gold standard?

Many assume that deterministic matches are definitive, a gold standard built on explicit connections, while probabilistic matches are dependent on implicit connections and therefore more like guesswork.

Not exactly. An explicit data match doesn’t equal correct customer data.

For instance, you might use your email address to log on to your account on your friend’s tablet. In the world of deterministic matching, that would indicate the tablet–and whatever it does from that point onward–belongs to you.

There are dozens of examples like this, and they all prove the point that unless you’re sure it’s accurate, it’s just a bad idea to start with unconfirmed deterministic matches as your gold standard building block. You’ll have a weak foundation if you do, and in our experience, an error rate as high as 50 percent.

Additionally, deterministic matches don’t fully account for patterns of behavior, as probabilistic matches do. But probabilistic matches don’t account for offline behavior because they need a deterministic identifier (e.g., an email address) to do so.

But, probabilistic matches aren’t necessarily guesses either. A pattern that consistently shows a smartphone and a laptop using the same network may constitute definitive evidence that they belong to the same person or family.

Because of these pluses and minuses, even a hybrid approach can compound the errors of each and lose sight of the reliability of the original raw data.

Instead, the strengths of both methods can be used to validate each other. For instance, a probabilistic pattern of behavior can test whether the deterministic identity pairs are accurate. And the probabilistic pattern can be tied back to an explicit, deterministic identifier. Together, they can build an entirely new graph.

It’s like making a chocolate raspberry cake. You wouldn’t just take a chocolate and a raspberry cake and smash them together–you’d create a new cake from scratch and blend both flavors together.

Constellations of data that bring customer identity into focus

When they are combined and applied in a holistic manner, explicit and implicit matches can be powerful tools for full identity resolution.

In fact, the strength of the linkages between data can be more important than, say, the fact that a user logged into their account with an email address with a particular phone. That’s a factor, but it’s one of many, and can be tested by the likelihood of the connections between other data points.

Rather than utilizing a hub-and-spokes model that uses identified pairs as the core and inferred matches as the spokes, strong identity is best built as constellations of data that are based on the strength of the linkages.

The linkage between a smartphone, a tablet and an IP address might be very strong, for instance, while the linkage between an email address and a device could be weaker. Accuracy at scale can be maintained if the strength of the connections is consistently tested across all of the identity data.

As in real life, a person’s full identity is not built on email addresses tied to devices. For complete identity resolution, it all comes down to the value of the relationships.