Look at any public site without protection and 30–60% of incoming traffic is not human. Search crawlers, monitoring bots, price scrapers, domain checkers, parsers from competitors, and a long tail of weirdness. Some of it is harmless or useful (Googlebot), some is annoying (scrapers), and some is actively harmful — flooding lead forms, opening ads, burning budget.

Antibot is not a single switch. It is four separate layers, each catching its own class of bots. A dumb bot dies on layer 1, a smarter one on layer 2, and so on. Here is how each layer works in plain English, no architectural deep-dive.

TL;DR

Four layers: IP reputation catches ~40% of the dumbest bots, user-agent and HTTP signals add ~20%, JS fingerprint takes it to ~95%, behavioral signals handle the rest. CAPTCHA works but hurts UX and in 2026 is no longer required for most cases — four layers are usually enough.

Why IP lists alone stopped working

In the 2010s the classic move was buy a database of bad IPs (or download a list from MaxMind / AbuseIPDB) and block requests from them. It worked reasonably well: botnets sat on the same ranges, proxy services overlapped, one big database caught about 70% of garbage.

Today everything is different. Residential proxy networks (“clean” IPs of real residential users, sublet through shady browser extensions) give bad bots an IP that is indistinguishable from a real human in Chicago. Cost: $0.50 per gigabyte of traffic. A bot that would have died on an IP filter a year ago now arrives at your site as a regular visitor.

IP reputation is still useful as the first layer — it catches lazy bots, cheap servers, known botnets and scanners. You just cannot rely on it alone.

Four layers in order

Layer 1. IP reputation and network context

The first thing we check on arrival: where the request came from. We look at:

  • Network type — residential, datacenter, mobile, hosting, VPN. Datacenter with no obvious reason is suspicious, especially for ad traffic.
  • ASN — the specific provider. Known “dirty” ASNs (hosting providers that botnets deploy on) are blacklisted wholesale.
  • History of this IP — any suspicious behavior in the last 30 days.
  • Geo coherence — does the country from the IP match the timezone and browser language?

This layer is essentially free latency-wise: the check takes 2–3 ms because the entire database lives in memory. It catches about 35–45% of bot traffic in an average ad campaign. The rest of the bots get to the next layer.

Layer 2. User-agent and HTTP signals

A bot can arrive from a clean residential IP and still forget to fake its headers. We look at:

  • UA — known bad signatures (python-requests, curl, HeadlessChrome, PhantomJS) are rejected on sight.
  • Header completeness — a real browser sends Accept, Accept-Language, Accept-Encoding, Sec-CH-UA, DNT, Upgrade-Insecure-Requests in a specific order. A bot often sends three out of ten.
  • TLS fingerprint (JA3/JA4) — the TLS handshake of a real Chrome looks one way, Firefox another, Python-requests a third. You can fake it, but 90% of bots do not bother.
  • HTTP/2 or HTTP/3 — modern browsers use them by default on any public site. A bot on HTTP/1.1 in 2026 is a flag.

This layer catches another 15–25% — bots that made it past the IP check but tripped on headers.

Layer 3. JavaScript fingerprint

The most interesting layer and the trickiest for bots. Once the page has loaded, we run a small JS that collects a dozen signals about the browser: engine version, screen resolution, available fonts, supported WebGL extensions, Canvas render timings, audio context, sensor availability, and so on.

These signals add up to a fingerprint — a unique browser signature. Real Chrome 131 on a Mac has one, Headless Chrome 131 has a slightly different one (Headless returns navigator.webdriver = true, counts Canvas pixels differently, lacks some fonts). Selenium-Chrome has a third. A bare Node bot has a fourth.

In TDS we keep a database of 2.3 billion known fingerprint patterns, recomputed every 48 hours on fresh traffic. If your fingerprint matches one that showed bot behavior yesterday — flag. If it matches thousands of other fingerprints with the same signature, which is what happens with automation frameworks — also a flag.

What we see in practice

The most common rookie mistake we see: bot operators forget to randomize between requests. A Selenium bot with no viewport, language, or sensor randomization produces exactly one fingerprint across 50,000 clicks. That is not a fingerprint — that is a signature. We see this in logs once a week.

This layer pushes coverage to 92–96%. What remains are the high-quality bots — those masquerading as regular users, on residential IPs, with faked headers and randomized fingerprints. Rare beasts, expensive, and they mostly show up in targeted attacks (a competitor in performance marketing actively click-burning your campaign).

Layer 4. Behavioral signals

The final layer. Runs after the user lands and starts doing something. We watch:

  • time between page load and first mouse movement (bot: 0 ms, or exactly 100);
  • movement trajectory (humans are jittery; bots are too straight, or noisy along too perfect a curve);
  • scroll speed, pauses between scrolls;
  • touch events (a real mobile user has them; an emulator usually does not);
  • time on page, hover on the CTA before the click.

Behavioral signals are aggregated over 3–5 seconds into a behavior-score. If the score is low, the next click from this fingerprint is pre-flagged as suspicious. Layer 4 teaches layer 3 — the fingerprint goes into a local watchlist.

What this looks like on 100,000 clicks

Real summary from one ad campaign over a week:

Total inbound clicks:           100,000   (100.0%)
├─ Filtered by layer 1 (IP):     38,400   ( 38.4%)
├─ Filtered by layer 2 (UA):     18,200   ( 18.2%)
├─ Filtered by layer 3 (FP):     29,800   ( 29.8%)
├─ Filtered by layer 4 (beh.):    8,500   (  8.5%)
└─ Passed all layers (humans):    5,100   (  5.1%)

The numbers in this campaign were extreme — it ran in Tier-3 geos, where bot traffic is historically ~95%. On Tier-1 campaigns the bot share is usually 25–50%, and the per-layer breakdown shifts. But the ordering layer 1 > 2 > 3 > 4 holds in most cases.

Why we do not use CAPTCHA by default

Reusable CAPTCHA (reCAPTCHA, hCaptcha, Turnstile) is a working fifth layer. But it has three downsides:

  • UX cost. Every CAPTCHA drops conversion by 5–15%. Nobody loves picking buses out of an image grid.
  • AI agents solve them. By 2026 yearly LLM agents like GPT-4o with computer use solve reCAPTCHA v2 at 87% accuracy. CAPTCHA has stopped being the last line of defense.
  • Third-party data leak. Google and Cloudflare see all your clicks. Acceptable for ad analytics, not always acceptable for redirects with private traffic.

CAPTCHA is justified in two cases: (a) when four layers are no longer enough (targeted attacks, click-burn by competitors), (b) on high-stakes forms (payment page, bank signup) where a false negative costs more than UX loss. Otherwise four layers are plenty.

Conclusion

Antibot in 2026 is four layers running in sequence: IP reputation → HTTP signals → JS fingerprint → behavior. One layer alone will not do it — you need all four. CAPTCHA is optional and often counterproductive.

If you want to dig deeper into layer 3 and understand exactly how fingerprints are collected and matched — we have a separate architecture piece: JS-fingerprint vs IP blocking.