Traditional web analytics tells you a lot about what happens on your website — page views, session duration, bounce rate, traffic sources. What it can't tell you is who was there. You get aggregate data about behavior, but the individuals behind that behavior stay completely anonymous.
For B2B companies, that gap is a serious problem. When a VP of Engineering spends 12 minutes on your pricing page and then leaves, Google Analytics shows you a session. What it doesn't show you is that it was Sarah Chen from Acme Corp evaluating your product for a $120K deal.
This is the gap that visitor identification tools fill. But the mechanics behind how they actually work aren't well understood — and that matters, because the technique determines what data you get, how accurate it is, and which visitors you can actually identify. Here's a full breakdown.
Why Traditional Analytics Can't Identify Visitors
Google Analytics, Mixpanel, and similar tools are designed for aggregate analysis, not individual identification. They deliberately anonymize user data to comply with privacy regulations and browser restrictions. Your analytics dashboard shows you trends — not people.
Even if you wanted to identify individual visitors through standard analytics, modern privacy changes make it increasingly difficult. Third-party cookies are dead in most browsers. Privacy-focused browsers like Firefox and Safari block cross-site tracking by default. VPNs mask IP addresses. Ad blockers strip tracking pixels.
The result: 98% of B2B website visitors — people with genuine buying intent — leave without identifying themselves, and traditional analytics give you no way to find out who they were.
Visitor identification tools take a fundamentally different approach. Instead of passively logging page events, they actively attempt to resolve the device and session to a known entity — either a company or a specific person.
Layer 1: IP Resolution
The first and most widely used technique is IP-to-company mapping. Every device connecting to the internet does so through an IP address. Many of those IP addresses are registered to companies — a block of IPs owned by Salesforce, for example, identifies traffic coming from Salesforce's corporate network.
Visitor identification tools maintain large databases that map IP address ranges to companies, and these databases are updated continuously as IP registrations change. When a visitor hits your site, the tool looks up their IP, checks it against the database, and — if there's a match — tells you which company the visitor is from.
This is straightforward and highly reliable for enterprise visitors on corporate networks. It breaks down in three common scenarios:
- Remote workers on home ISPs — their IP resolves to their internet provider (Comcast, AT&T), not their employer
- VPN users — IP resolves to the VPN provider, not the company
- Small companies — many small businesses don't have dedicated IP blocks; their IPs are shared residential or small-business ranges
As remote work has become the default for knowledge workers, this limitation has grown significantly. A tool that was 70% accurate in 2019 may now identify 40–50% of your B2B visitors through IP alone — because a large portion of your visitors are no longer on their company's network when they research you.
Layer 2: Reverse DNS and WHOIS Lookups
Reverse DNS (rDNS) is a complementary technique. When a device connects to the internet, its IP can sometimes be resolved back to a hostname — and that hostname often contains organizational identifiers. An IP resolving to vpn.acmecorp.com tells you more than the raw IP does.
WHOIS lookups augment this: IP address blocks are registered with regional internet registries (ARIN, RIPE, APNIC), and those registrations include the organization name. Combining rDNS with WHOIS gives better signal than IP lookup alone, but it's still limited to company-level resolution — it tells you the organization, not the person.
See who's visiting your site — right now
Webhawk resolves individual visitors to names, titles, and emails — not just companies. Run a free scan on any URL.
Run a Free Demo Scan →Layer 3: Device Fingerprinting
Device fingerprinting is a technique for identifying a specific device — and by extension, the person who uses it — without relying on cookies or IP addresses. It works by collecting a combination of browser and device attributes that, taken together, form a statistically unique signature.
Attributes used in device fingerprinting typically include:
- Browser type, version, and installed plugins
- Operating system and version
- Screen resolution and color depth
- Timezone and language settings
- Installed fonts (rendered via Canvas API)
- Graphics rendering characteristics (WebGL fingerprint)
- Audio processing signature (AudioContext API)
- Hardware concurrency (number of CPU cores)
No single attribute uniquely identifies a device. But the combination of 15–20 attributes creates a fingerprint that is unique for the vast majority of devices. Research suggests that browser fingerprints are unique for over 80% of web users.
Device fingerprinting doesn't tell you who a person is by itself — it just consistently identifies the same device across sessions. The resolution to a person happens in the next layer.
Layer 4: Identity Graphs
An identity graph is a database that links device fingerprints, cookie IDs, email hashes, and public profile data into unified individual profiles. This is where company-level identification becomes person-level identification.
Identity graphs are built through a combination of:
- Deterministic matching — linking a device to a person when they log in to a service, complete a form, or click an email link from a known address. This creates a confirmed, high-confidence identity link.
- Probabilistic matching — inferring that two identifiers belong to the same person based on behavioral signals (same device, same location, similar browsing patterns). Lower confidence but much larger scale.
- Public data enrichment — correlating fingerprints and email hashes with public professional profile data (LinkedIn profiles, company directories) to surface names, job titles, and contact information.
When a visitor lands on your site and their device fingerprint matches a record in the identity graph, the tool can resolve that device to a named individual — full name, job title, business email, LinkedIn URL — without the visitor ever filling out a form.
The quality of the identity graph determines the match rate. A larger, higher-quality graph means more visitors get resolved. This is why match rates vary significantly between tools — from 20% on the low end to 65%+ for the best person-level identification systems.
Layer 5: First-Party Data Enrichment
For visitors who have previously interacted with your site — submitted a form, clicked a marketing email, started a trial — you already have identity data. First-party data enrichment connects these known contacts to current session activity.
A visitor who submitted a demo request six months ago and is now back browsing your pricing page can be identified immediately through a first-party cookie or hashed email match. The tool surfaces both their identity and the new session context — what they visited, how long they stayed, which pages they hit.
This is the highest-confidence identification method because it's based on data the person explicitly provided. The challenge is coverage: it only works for visitors already in your database, which is a small fraction of total traffic.
What Data You Actually Get
The combination of these techniques produces different data depending on which layer resolves successfully:
| Resolution Method | What You Get | Typical Match Rate |
|---|---|---|
| IP resolution only | Company name, industry, location, company size | 40–60% of visitors |
| IP + rDNS + WHOIS | Company name + organizational subdomain | 45–65% of visitors |
| Identity graph match | Full name, job title, business email, LinkedIn profile | 50–65% of visitors (US), lower internationally |
| First-party data match | Full CRM profile + current session behavior | 5–15% of visitors (existing contacts only) |
| No match | Anonymous session data only | 35–50% of visitors |
Modern tools layer these techniques — attempting higher-confidence resolution first (first-party data, identity graph) and falling back to lower-confidence methods (IP resolution) when the higher layers fail. The goal is to resolve as many visitors as possible to the highest level of specificity achievable.
How Technique Choice Affects What You Can Do With the Data
The identification layer determines the actionability of the output.
Company-level resolution gives you account names. You can use those to prioritize your ABM target list, trigger account-based advertising, or alert SDRs that a target account showed intent. But you still need to research who the right contact is and reach out cold — you know a company visited, not a person.
Person-level resolution gives you a named individual. Outreach can go directly to the person who evaluated your product. The message can be specific — you know their title, their company, the pages they visited. The entire step of "figure out who to contact" is eliminated. That's the operational difference that makes person-level data disproportionately more valuable for outbound sales.
There's also a timing dynamic. Intent is time-sensitive. A VP who was on your pricing page this morning is in a different mental state than one who visited three weeks ago. Person-level resolution collapses the time from intent signal to first touch because there's no research step in between. Tools that can surface a name, title, and email within minutes of a visit enable outreach that is both faster and more relevant.
What This Looks Like in Webhawk
WebhawkOS applies all five layers in sequence. When a visitor lands on a customer's site, the system runs IP resolution, device fingerprinting, and identity graph matching simultaneously. If the visitor is in the identity graph, they're resolved to a named person with full contact data — typically within seconds of the visit.
For visitors who can't be resolved to individuals, the system falls back to company-level context so no intent signal is lost. Everything feeds into the customer's CRM or outreach tooling in real time.
The result is a feed of identified visitors — mostly named people with job titles and emails, some company-only records for the visitors that couldn't be resolved further — with full page-level session data attached. No prospecting. No manual research. The visitors identified themselves by showing up.
If you want to see what this looks like on your own site, the demo is the fastest way to understand the coverage. Run any URL and see a sample of who's actually visiting — not aggregate sessions, but specific people.