HasData
Back to all posts

How HasData Scrapers Achieve 99.9% Uptime at Millions of Requests

Roman Milyushkevich
Roman Milyushkevich
Last update: 25 Sept 2025

At HasData, reliability is something we engineer, monitor, and validate every single day (and we mean it). In this post, we’ll show you the systems behind our 99.9% uptime: synthetic testing, monitoring dashboards, proxy health checks, and infrastructure choices.

Synthetic Tests: Validating APIs Every Day

We continuously run synthetic tests across all our APIs. Several times per day, each API is exercised with at least 10 parameter variations.

Take our Google SERP API for example. When we query q=coffee, a “healthy” response should include at least 7 organic results, a knowledge graph, a local pack, related questions, pagination, and more.

We validate each block individually. For organic results, for instance, we check that every entry includes a link, title, and snippet. If anything falls short, we know immediately.

All results flow straight to Slack, so the team is alerted before customers ever feel the impact.

Monitoring Dashboards: Success Rates and Latency

Synthetic tests catch regressions, but real-time visibility into production traffic is just as important. Every API has two key dashboards:

  • Success/Failed Requests Chart: tracks the ratio of successful responses to failures.
  • Latency Chart: measures p50, p80, p90, and p99 latencies.

If failures or p99 latency increases, alerts go to our monitoring channel. From there, engineers can drill into the exact request ID, with full logs and cross-service traces.

Proxy Health: Monitoring the Hidden Layer

Much of our success rate comes down to the proxy networks powering our APIs. We monitor them just as closely as the APIs themselves:

  • Success rate per API per retry
  • Traffic volume per proxy
  • Median response size of scraped pages

If a proxy underperforms, we isolate and replace it before it affects users.

Infrastructure and Observability

Our APIs run on a self-hosted Kubernetes cluster. Managing our own infrastructure gives us the control we need for performance and scaling.

  • Dedicated servers run our database instances
  • Dedicated servers power the monitoring stack
  • Grafana + Prometheus track metrics across the system
  • ClickHouse stores traces and logs for high-volume analysis

This setup scales to millions of requests while keeping costs predictable and visibility high.

How It Works in Practice: An Incident Flow

Here’s how these systems connect when an issue occurs:

  1. A synthetic test fails and alerts the team in Slack
  2. Latency charts confirm a spike in p99 latency
  3. Proxy monitoring shows a drop in success rate for one of the proxies
  4. Engineers reroute traffic, replace the failing proxy, and confirm resolution

The loop from detection to fix is fast and transparent because every layer is monitored and traceable.

Why This Matters to Our Customers

Everyone claims “99.9% uptime.” For us, it’s a 24/7 engineering process.

  • Failures are caught early, often before they reach production scale
  • Latency is continuously tracked across percentiles, not just averages
  • Infrastructure is built for scale, instead of just a minimum viable setup.

With HasData, you don’t have to wonder if your requests will succeed — our monitoring, testing, and infrastructure ensure they do.

Roman Milyushkevich
Roman Milyushkevich
I'm a big believer in automation and anything that has the potential to save a human's time. Everyday I help companies extract data and make more informed business decisions for reach their goals.
Articles

Might Be Interesting