Marketing Measurement ·

What is an identity graph? A marketer's guide to identity resolution

Marketers evaluating tools that rely on identity graphs should ask specific questions about match rates, data sourcing, and how the tool handles signal loss.

Listen
0:00 / 0:00
AI-generated audio
What is an identity graph? A marketer's guide to identity resolution

A hotel loyalty program is a surprisingly good metaphor for a problem marketers deal with every day. When a loyal guest books through the app, calls the front desk, or walks in and hands over their card, the hotel knows it's the same customer and treats them accordingly. That recognition doesn't happen by magic. It happens because all of those interactions are linked to a single profile.

Your customers do the exact same thing across devices, channels, and touchpoints. They see your ad on connected TV, browse your site on a laptop, and convert three days later on their phone. Whether your measurement tools recognize that as one customer journey—or three separate, unconnected events—depends on whether you have identity resolution in place.

Key takeaways

  • An identity graph is a database structure that links fragmented customer data points—across devices, channels, and contexts—into a single, unified customer profile.
  • Identity graphs are built using two methods: deterministic matching (verified connections like logged-in emails) and probabilistic matching (inferred connections based on shared signals like IP address or behavioral data).
  • Match rates for probabilistic matching are not static; they degrade as third-party signal sources like third-party cookies and mobile ad IDs become less accessible.
  • Many measurement tools, including multi-touch attribution, depend on identity graph completeness. As that completeness erodes in the cookieless future, so does attribution quality.
  • An identity spine is the persistent anchor identifier at the center of a graph—typically a hashed email or loyalty ID—not the same thing as the graph itself.
  • Marketers evaluating tools that rely on identity resolution should ask specific questions about match rates, first-party data sourcing, and how the tool handles signal loss.
  • Marketing mix modeling (MMM) works at the aggregate level and doesn't require individual identity resolution, making it less vulnerable to the data gaps affecting user-level tracking.

What is an identity graph?

An identity graph (sometimes called an ID graph) is a database structure that links fragmented data and online and offline customer data points to create a single, unified customer view. It's the infrastructure that makes identity resolution possible, the process by which businesses recognize the same person across different devices, channels, and marketing touchpoints.

Many organizations rely on graph databases to power this structure. At its core, the architecture has two components:

  • Nodes: The individual identifiers belonging to a person: email addresses, phone numbers, device IDs, cookie IDs, mobile ad IDs (MAIDs), loyalty account details, and more.
  • Edges: The verified or inferred connections between those identifiers, establishing that multiple data points belong to the same person.

Think of the identity graph as a living web of customer data. The more complete and accurate the connections, the more reliably a system can recognize one customer across many surfaces, and identity graphs provide that connective infrastructure.

How identity graphs connect identifiers

Building an identity graph means constantly deciding how confident you are that two identifiers belong to the same person. There are two approaches to making those connections:

Deterministic matching uses explicit, verified links. When a customer logs into your site with the same email address tied to their loyalty account, that's a definitive connection. Deterministic identifiers include hashed emails, login credentials, and contact details tied to a known profile. These links are high-confidence by design.

Probabilistic matching fills the gaps where deterministic signals don't exist. When someone browses your site anonymously from a home IP address on both a laptop and a tablet, shared signals—same IP, similar behavioral data, overlapping device IDs—can suggest those are the same person. Machine learning evaluates the statistical likelihood and assigns confidence scores accordingly.

Most identity graphs use both methods. Deterministic matching anchors the graph while the probabilistic approach extends it across multiple browsers, multiple devices, and the broader open web.

The match rate reality check

Probabilistic match rates are not fixed. They vary by audience, data quality, and available signal sources. A vendor quoting an 80% match rate may be accurate for one segment and optimistic for another, and those rates change as signals become less available. This is a core consideration when evaluating any tool that relies on identity resolution, and it directly affects cross-device attribution accuracy.

What data flows into an identity graph?

Identity graphs ingest customer data across the full customer journey. For marketers, the relevant data types break down like this:

Data typeExamplesWhere it comes from
Offline identifiersFull names, phone numbers, mailing addresses, loyalty IDsCRM, in-store, call centers, support agents
Digital identifiersHashed emails, device IDs, MAIDs, IP address, cookiesWebsite sessions, mobile apps, ad platforms
Behavioral dataPurchase history, browsing patterns, customer actionsSite analytics, platform events

The practical challenge is that these sources don't naturally speak to each other. An identity graph is the infrastructure that links them, but only as completely as the underlying data allows.

The signal erosion problem

Several of these customer data sources are under active pressure. Third-party cookies are being phased out across browsers, and the cookieless future is already here for a meaningful portion of web traffic. Privacy regulations like CCPA have expanded consumer rights over how personally identifiable information is collected and used, and privacy controls are increasingly built into browsers and devices by default. Mobile ad IDs have become harder to access since Apple's iOS privacy updates limited cross-device tracking. Pixel blocking is common.

For identity graphs that rely heavily on third-party signals, a degraded IP address pool, and probabilistic inference, this is a current and ongoing problem. The graph doesn't disappear, but its coverage shrinks and cross-device attribution accuracy shrinks with it. And any measurement system sitting on top of the identity graph inherits those gaps.

Understanding this is one of the more actionable takeaways a marketer can bring into a tool evaluation conversation.

Identity spine vs. identity graph

This is a distinction that comes up often, and we'll make it as clear as possible:

An identity spine is the persistent anchor identifier at the center of a customer's profile, usually a hashed email, a loyalty program ID, or another first-party identifier your brand controls directly. It's the source-of-truth anchor that holds the graph together over time.

The identity graph is the full network built around that spine: all the connected devices, sessions, behavioral signals, and probabilistic links that extend outward from the anchor. Many of these are stored using graph databases, which are built to handle relationship-heavy data structures like an identity graph.

The spine is what makes the identity graph stable over time. Without a reliable anchor, the graph becomes a loose collection of inferred connections that shift as signals change. First-party graphs—those built on first-party data that customers have shared directly with your brand—are generally more durable than graphs dependent on third-party identifiers. First-party data like a loyalty email or a purchase record doesn't deprecate the way a third-party cookie does.

When vendors claim their approach delivers higher accuracy than competitors, the right question is: what's anchoring your spine?

How identity graphs connect to marketing attribution

Cross-device attribution—tracking customer journeys across devices—depends on the ability to link multiple identifiers belonging to the same customer. That capability rests on identity resolution. If the underlying identity graph has gaps—customers who can't be matched across devices, IP addresses that can't be reliably connected, anonymous sessions that can't be tied back to known customer profiles—cross-device attribution inherits those gaps and attribution credit gets misassigned. Customer data that should inform budget decisions ends up incomplete.

In other words, the quality of user-level marketing attribution is tied to how well your identity graph works. As identity resolution gets harder, user-level attribution gets less reliable, not because the methodology is wrong, but because the data underneath it is fragmented.

The question for any brand evaluating measurement tools isn't just "what does this tool measure?" It's "what does this tool depend on to measure it?"

What to ask when evaluating tools that use identity graphs

If you're assessing any measurement tool that relies on identity resolution, these questions will surface what you need to know:

  • What is the match rate, and how is it calculated? Ask whether it's an average across all data or specific to your customer base and industry.
  • What proportion of matches are deterministic vs. probabilistic? A high match rate built primarily on probabilistic matching is more fragile than one anchored in first-party, deterministic data.
  • How does the tool handle signal loss? As third-party cookies and MAIDs become less accessible, what happens to coverage?
  • Where does the identity data come from? First-party data your brand collects directly—purchase records, loyalty sign-ups, mobile app logins—is more durable than data sourced from third-party brokers or inferred from shared IP addresses.
  • How often is the identity graph refreshed? Customer identifiers go stale. An identity graph that isn't actively maintained will drift from reality, and so will the attribution that depends on it.

Vendors with honest answers to these questions will be direct about trade-offs. The ones who aren't are telling you something important.

Where Prescient comes in

Prescient's marketing mix modeling uses statistical models that work at the aggregate level, analyzing relationships between your marketing spend, impressions, and revenue across channels and campaigns. Because the model doesn't require following individual users across devices, it doesn't depend on identity graph completeness. Signal loss from cookie deprecation, MAID restrictions, or pixel blocking doesn't degrade Prescient's attribution the way it affects user-level measurement tools.

That also means Prescient can measure campaign-level performance across your full marketing mix—including marketing halo effects that land in branded search, organic traffic, direct traffic, and retail channels—without needing to resolve which individual customer triggered which event. Book a demo to see the platform in action and how it can uncover new insights for your brand.

FAQs

What is an identity graph in marketing?

An identity graph in marketing is a database structure that links multiple customer identifiers—across devices, channels, and data sources—into a single, unified customer profile. It's the foundation for identity resolution: the process of recognizing the same person across different touchpoints. Marketers use identity graphs to power cross-device attribution, frequency capping, and consistent customer experiences across channels.

What is the difference between a knowledge graph and an identity graph?

A knowledge graph maps relationships between concepts, entities, and information; it's used to represent how ideas relate to each other, and it's common in search engines and AI systems. An identity graph maps relationships between data points belonging to the same individual customer. Both use nodes and edges structurally, but they serve entirely different purposes: one organizes information, the other resolves customer identity.

How do you graph an identity?

Building an identity graph starts with collecting identifiers across every touchpoint where individual customers interact with your brand: login events, purchases, site sessions, CRM records, mobile app activity, and more. Those identifiers are then linked through deterministic matching (verified, explicit connections) and probabilistic matching (statistically inferred connections based on shared signals). The result is a network of linked identifiers anchored by a durable first-party identifier like a hashed email. Maintaining the graph is an ongoing process, because identifiers change, signals deprecate, and customer behavior evolves.

What is the difference between an identity spine and an identity graph?

The identity spine is the persistent anchor identifier—usually a hashed email or loyalty ID—that holds a customer's profile together over time. The identity graph is the full network of connected identifiers built around that spine, including devices, sessions, and probabilistic links. The spine provides stability; the graph provides reach. A graph without a reliable spine is more vulnerable to signal loss and less accurate for cross-device attribution.

The Halo

Exclusive insights, every week.

Subscribe to The Halo for sharper marketing thinking.

Keep reading