In 2022 our postback latency was 280ms p95. In 2025 it's 28ms. This post is about how we went from one number to the other, what we tried along the way, and what didn't work.
Three things gave us the biggest wins: pushing processing to the edge (Cloudflare Workers plus our own nodes), switching from REST to UDP-batched postbacks, and rewriting the hot path in Rust. Smaller but noticeable wins came from kernel-bypass logging and killing every JSON parser on the request path.
Why postback latency matters at all
A postback is an HTTP request that an affiliate network fires at our server when a conversion happens. We catch it, map the click_id back to the original click, and record the conversion.
When you respond slowly, bad things happen. Affiliate networks set timeouts (usually 1-3 seconds), and behind the timeout they queue postbacks. Under heavy load the queues grow, postbacks start dropping, and your stats stop matching. One of our customers in 2022 ended up with a 4% discrepancy over a month - about $12k in their case.
On top of that, slow postbacks mean offer caps are checked late. If your offer has a 100-conversion daily cap and the postback takes 300ms, you can easily overspend 5-10% past the cap.
Where the 280ms came from
Profiling showed the request path in 2022:
partner → DNS (15ms) → TLS handshake (40ms) → nginx (8ms)
→ Laravel/PHP (110ms) → MySQL (45ms) → ClickHouse (38ms)
→ response (15ms) = ~270ms p95
Half the budget was burned on the PHP stack and MySQL. ClickHouse for attribution is heavy but it's the only sane way to store billions of clicks at a reasonable cost.
First thing we tried was caching click_id → conversion_id attribution in Redis. That removed half the MySQL queries and shaved off about 30ms. We were down to 240ms p95. Decent, but not enough.
Step 1: push to the edge
In early 2023 we started handling postbacks directly on edge nodes - ours and Cloudflare Workers. The idea is simple: instead of dragging a request through DNS, load balancer, nginx, PHP-FPM, we accept it in one of 50 geographically distributed data centers and write it to a local buffer immediately.
This killed DNS resolution overhead (most affiliate networks were resolving us every 60 seconds) and TLS overhead (Cloudflare keeps sessions on the edge). Latency dropped to 110ms p95.
The catch: click_id attribution lives on central nodes, not on the edge. We solved it with async replication: the edge accepts the postback, drops it into local Redis with a 5-minute TTL, and responds 200 OK. In the background a separate worker replicates that to central. If central is down, the postback is still recorded and not lost.
We tried running PHP handlers directly on the edge via FrankenPHP. Nice idea, but we hit problems with local extensions (mbstring wasn't identical everywhere) and latency didn't drop. Rolled it back after a month.
Step 2: UDP batching
When we rewrote partner ingest as gRPC streaming in 2024 (for the partners that support it), p50 latency fell below 50ms. For everyone else - the partners who still send classic GET postbacks - we added an internal trick: UDP batching.
An HTTP request comes in, the edge immediately responds 200 OK, then writes the payload to a shared-memory queue. Every 10ms a separate process gathers everything in the queue into a single UDP packet and fires it at the central collector.
Pros: client-side latency is bound only by the TLS handshake and one network round-trip. Cons: you need to be ready for UDP packet loss (about 0.001% in our measurements), so every event has a dedup key and the central side processes them idempotently.
After this, p95 dropped to 45ms.
Step 3: Rust on the hot path
The hot path is the code that runs on every request. In our case: URL parsing, click_id lookup, signature validation, queueing. Until 2024 this was a Go program. In 2025 we rewrote it in Rust.
What the rewrite gave us:
- No GC pauses. Go was giving us 2-4ms pauses under load. That showed up on p99.
- Zero-copy parsing. serde_urlencoded works on &[u8] directly, no allocations.
- Tokio + io_uring for async I/O. Cut network stack overhead by 30-40%.
After the rewrite p95 fell to 32ms, p99 to 80ms. The final optimisation was killing the JSON parser in logging. We were dumping logs as JSON, then ELK parsed them. We switched to a binary format (Cap'n Proto), the parser went away, and p95 dropped to 28ms.
Today's numbers
Per April 2026, on roughly 4 billion clicks per month:
- p50: 12ms
- p95: 28ms
- p99: 78ms
- p99.9: 220ms (usually the tail from CIS traffic on bad ISPs)
Going further is hard - we're starting to hit physics. A network round-trip Europe → US-East is 80-90ms RTT, and for a couple of large partners we can't go faster because they don't have a US endpoint. We're considering a proxy on our side in their DC.
If you want to copy this
The biggest effort-to-payoff ratio was pushing to the edge - 100ms gone in one iteration. If you're on a classic nginx + backend setup, try this first. Cloudflare Workers plus edge Redis are cheap and quick to roll out.
Rust makes sense when you've already squeezed the architecture and you're fighting for milliseconds. If your latency is 500ms today, a Rust rewrite won't help - the problem isn't the language.