Turning anonymous web traffic into warm outbound

Recovering the identifiable slice of anonymous web traffic, and routing it to a relevant human touch with no manual step in between.

2026-06-18 · 6 min read

Many faint dotted paths converging into a single bold line that ends at one solid node.

Most B2B buyers research a product without ever filling in a form. They read the docs, compare you to an incumbent, look at pricing, and leave no name behind. This is the system that closes that gap: from an anonymous click to a personalized first message, with no manual step until a prospect replies.

Identity: stack providers, then dedupe#

No single vendor identifies everyone.Account-level (reverse-IP) identification resolves far more traffic, but only the company, not the person. The two layers answer different questions, so both run. Person-level reveal realistically resolves 10-30% of US B2B traffic, and each provider covers a different slice. Several run in parallel (Vector, RB2B, and others), with their output merged and overlapping identities deduped on a stable key (durable person id, falling back to business email, then LinkedIn). The union beats any single source, and the dedupe stops the same visitor from firing twice.

Europe is a separate regime. Person-level reveal is hard to defend under GDPR, so EU traffic is handled at the company level (reverse-IP, account resolution) and fed into an account-based play rather than an individual outreach.

The pipeline#

One webhook fires per identified visit. Everything downstream is a single orchestration flow, so the path from click to outreach is one auditable run.

Ingestion and dedupe#

Concurrency serialized to 1. A visitor with two tabs or a reload fires two webhooks in the same second. Run them in parallel and both read empty state, both alert. A queue with a single slot forces them through one at a time.
Daily dedupe, record-then-skip. A per-visitor key with a one-day TTL suppresses a repeat alert the same day. The raw event is always written first, so same-day revisits still feed the history. The gate suppresses the message, never the data.

Event store and recurrence#

Every visit is one row in an event store, which is what makes it possible to count visits and rebuild journeys.The 30-minute gap is the conventional analytics session boundary, the same one tools like Google Analytics use, so the visit counts stay comparable to standard web sessionization. Recurrence is computed by sessionizing on the real visit timestamps, not the execution time, because reveal lags the visit by minutes. A new session starts when the gap between two events exceeds 30 minutes. That produces an honest "first visit" versus "3 visits since 3 Jun", the single most useful line for a rep, because it separates a one-off from a warming prospect. The same store rolls up at the account level too: one person on a pricing page is curiosity, but three people from the same company on pricing within a week is a buying committee forming, a far stronger signal than any single visit and a cue to multi-thread the outreach rather than chase one name.

Figure 1. Sessionization: events more than 30 minutes apart count as separate visits, turning a stream of page hits into an honest "3 visits".

Enrichment and suppression#

Each visit is cross-referenced against the CRM (the contact, and the company by domain independently, since the visiting person is often not the one in the CRM) and the warehouse. Two questions decide what happens next: is this company already a customer or active user, and is there an open deal at contact or company level. If Sales is mid-deal, the visitor is suppressed rather than cold-prospected. Suppression is the feature that makes the channel usable internally.

Relevance gate#

A small LLM reads the visitor in context (title, seniority, and department, plus the company's industry and size) and the pages viewed, then returns a strict include or exclude with a one-line reason. This is the noise filter: visitor-ID tools are notorious for flooding a channel with thousands of low-fit accounts, so anyone clearly outside the audience never reaches a rep. Hard rules run before the model: internal addresses are dropped, and so is anyone with no actionable identity (no LinkedIn and no business email). The gate fails open: if the model is unavailable, the visitor is included, because missing a real buyer costs more than reviewing a borderline one. It governs only the human-facing alert. The full record is always kept.

model-agnostic: any one of these can run the judgment

Real-time signal#

Every visit that passes the gate fires a Slack alert in a dedicated channel, already enriched, so the team sees who is browsing in real time. A faithful reproduction of one alert:

Website Visitors APP 11:03

Priya Raman · Machine Learning Engineer

Apple · Consumer Electronics · 10001+ · Cupertino, CA

3 visits since 3 Jun

Pricing · 2h ago
Docs: ML pipelines & scheduling · 1d ago

LinkedIn Profile

Figure 2. The Slack alert: who, company size and industry, the intent pages, the recurrence line, one click to the profile. Customers and accounts in active pipeline are flagged so the rep backs off.

Scoring: a 0 to 100 read on every visit#

The pages a visitor reads are the signal. From which docs and which product areas they spend time on, a usage pattern is inferred: a data profile, an AI profile, or an infra profile. That pattern, combined with seniority and intent (pricing, comparison, and getting-started pages outrank a single docs hit), feeds a score. Where the relevance gate only decided whether a visit was worth surfacing to the team, the score decides whether it enters outbound at all, and how.

The score comes from an LLM step that returns a structured object: a number from 0 to 100, a tier, one line of reasoning, an ICP-match summary, and a ready-to-send opener. It reasons over fit and intent together, and it is deliberately strict: existing customers and internal addresses score 0, clearly non-technical roles are capped low, and only a fraction of traffic clears the bar. Taking the visitor from the alert above:

{
  "score": 84,
  "tier": "priority",
  "reasoning": "ML engineer at a large enterprise, repeat pricing visits.",
  "icp_match": "AI / ML · Consumer Electronics · 10001+ · Pricing intent",
  "opener": "Hello Priya, I work with the AI / ML team and always interested in feedback from people building in this space. Have you ever used it?"
}

The tier is the throttle. Scores map to do-not-contact, low, medium, high, or priority, and only medium and above is ever contacted, so outbound touches a deliberate slice of the traffic instead of all of it.

Figure 3. The 0 to 100 score maps to tiers, from do-not-contact up to priority. Only medium and above is ever contacted, so outbound touches a deliberate slice of the traffic.

Routing: who, and how#

The same object then drives routing on two axes. Who, from the inferred profile: an AI visitor like Priya goes to an AI engineer, a data visitor to a data engineer, an infra visitor to an infra engineer. How, from the contact data on file: a phone number can route an immediate call task, otherwise it is a LinkedIn opener, and a higher tier earns a faster, more human touch. Score, channel, and angle all come from that one object.

For the LinkedIn path, leads above the bar are enrolled automatically into a sequence on lemlist, routed to the engineer whose specialty matches the inferred pattern. The message reads engineer to engineer, a peer asking about a concrete problem, not a rep running a template. Behind the scenes those engineer accounts are operated by Sales from one shared inbox (a unibox), so a single rep handles every reply in one place and turns it into a booked meeting. The sequence itself stays light: visit the profile, send a connection request, a short opener on accept, and one soft follow-up if there is no reply.

This is where the conversion lives. Industry benchmarks put signal-triggered reply rates at 5-18% against roughly 3% for cold outreach, and the trigger here is a real visit on a buying-intent page answered while it is still warm by the right specialist, fast enough on the top tier to mean a phone call within minutes. The data tells you who is in-market right now. The only question worth engineering is how fast, and how relevantly, you answer it.