Average latency hides the slow scrapes. In a nightly job of 10,000 product URLs, most calls can finish in 1–2 s while a handful sits at 20 s after a CAPTCHA retry. The mean still looks fine; your deadline does not.
Percentiles (p50, p75, p90, p95, p99) show how response times are distributed. They answer: "How long did at least X% of scrapes take?"
Track them on your targets with Scraper APIs. Compare Crawler vs Rendering vs WebUnlocker modes before blaming a site.
What each percentile means
pN = N% of requests finished in that time or faster.
| Percentile | Meaning |
|---|---|
| p50 | Median: half of scrapes were this fast or faster. |
| p75 | 3 out of 4 scrapes finished within this window. |
| p90 | 9 out of 10. |
| p95 | 19 out of 20. |
| p99 | 99 out of 100, the slow tail that breaks batch jobs. |
Scraping examples
Same price-monitoring job, different pages:
| Percentile | Example | Latency |
|---|---|---|
| p50 | Static product page, Crawler mode | 0.8 s |
| p75 | Category listing with pagination | 1.5 s |
| p90 | React PDP, Rendering wait for price selector | 3 s |
| p95 | Job board listing after one 403 retry + proxy rotation | 6 s |
| p99 | Marketplace behind DataDome, WebUnlocker + CAPTCHA | 18 s |
p50 = 0.8 s but p99 = 18 s is normal in scraping. The median tells you cost per page; the tail tells you whether the job finishes on time.
Why the average lies
1,000 scrapes: 900 × 1 s, 90 × 8 s (CAPTCHA retry), 10 × 45 s (timeout). Mean ≈ 3 s, but 10% waited 8 s+. Percentiles surface that gap immediately.
What to watch in production
Slice latency by domain, HTTP status, and scrape mode (1 / 2 / 3 credits). Plot p50 and p99 together: if p50 is flat but p99 rises, you likely hit new anti-bot rules or broken selectors, not a global slowdown.
Batch jobs: size timeouts on p95, SLAs on p99. Example SLA: "p95 ≤ 5 s over 24 h, successful scrapes only."
Takeaways
- p50 = typical scrape cost; p99 = why the nightly job missed its window.
- Tail spikes come from retries, JS rendering, anti-bot, and cold proxy sessions, rare but expensive.
- Fix the median with Crawler; fix the tail with WebUnlocker or tighter per-domain caps.