HasData

Enrichment of lead-generation datasets with verified emails from Google SERP, achieving 4x coverage compared to Hunter and Clearbit

Get Started
Lead-Generation Dataset
  • 01 Input record
  • 02 Google SERP API request
  • 03 Parse and push
  • 04 Name-to-email matching algorithm
  • 05 QA
  • 06 Dataset assembly and delivery
ClientsClients

use case snapshot

01

Industry

Data-as-a-Service (DaaS), Web scraping

02

Use Case

Enriching lead-generation datasets with verified email addresses of professionals (lawyers, real-estate agents, dentists, chiropractors) sourced from Google SERP

03

Team Role

Data operations, product teams

04

Time to Value

A complete, enriched dataset of ~100,000 contacts is delivered in 1 day

05

Business Impact

Verified email addresses for ~80% of contacts in each dataset vs. ~20% with tools like Hunter or Clearbit

the strategic insight

Public professional directories such as Avvo, Realtor.com and WebMD provide rich profile data (names, phones, ratings, specialties), but they don’t expose email addresses, hiding them behind a login or pay wall that is illegal to scrape.

Combine that with the fact that about 80% of professionals — including real-estate agents, dentists and chiropractors — rely on personal Gmail, Yahoo or Outlook inboxes rather than company-domain emails. This means traditional enrichment tools like Hunter and Clearbit can find contact details for just about 20% of these professionals, because they only focus on business email addresses.

Here’s how HasData bridges this gap by: Scraping Google SERP results for every professional to surface emails published anywhere on the open web; Running a fuzzy-matching algorithm to confirm the email likely belongs to that individual; Delivering a fully public, fully compliant address that can be marketed to immediately. The result: in just one day, you receive a complete, email-rich dataset that powers your product and equips your customers with a wealth of verified contacts for their lead generation and marketing campaigns.

author

Sergey Ermakovich

Head of Marketing

Context

01

Who

HasData’s own data ops and product team responsible for building and selling contact datasets on hasdata.com


02

Trigger moment

Users demanded email-complete lists; directories provided none, and scraping behind logins would violate policy


What we tried
and why it failed

Previous Approaches

  • Scraping the directory directly
  • Enrichment tools like Hunter and Clearbit
  • Manual data collection

Limitations

  • Email fields absent or hidden behind login/paywall
  • Focus exclusively on business-domain email addresses, covering less than 20% target contacts
  • Non-scalable beyond a few hundred records

The scraping
and enrichment workflow

01 Input record

Start with a baseline record collected from a public directory

Start with a baseline record collected from a public directory such as Avvo, Realtor.com or WebMD. Each record includes a professional’s full name, job role or specialty, city and state, phone number, website and other available fields.

02 Google SERP API request

Google SERP API request

  1. Compose a search string like: “{first} {last} {city} {role} email address”.
  2. Run the query via Google SERP API with num=100 to to retrieve a parsed JSON response containing 100 organic results along with rich snippets.

03 Parse and push

Parse and push

  1. Extract every string matching an email pattern from each organic result snippet as well as rich snippets like Knowledge Graph.
  2. Push all raw emails and associated metadata into HasData’s internal enrichment engine via an API endpoint.
  3. If no email is found, automatically retry Google SERP API request with:
    • Synonyms (e.g., “lawyer” → “attorney,” “realtor” → “real estate agent”)
    • Broader geography (expanding from city to state or state to country)

04 Name-to-email matching algorithm

Name-to-email matching algorithm

  1. Break the name into parts. Split the full name (for example, “John M Smith”) into first, middle and last name. Generate initials (“jms”).
  2. Extract email components. For [email protected], split into username (johnsmith) and domain (gmail.com).
  3. Check for matches using exact and fuzzy rules:
  • Does the username contain the first, middle or last name?
  • Does the username include the person’s initials? Is the username or domain very similar (small edit distance ≤ 2) to any part of the name?
  • (Uses the fast-levenshtein npm package to detect small spelling differences.) Return true or false. If any rule passes, the email is flagged as likely belonging to that person. Otherwise, it is discarded.

05 QA

QA

  1. All valid emails flow into a Google Sheet.
  2. A human analyst spot-checks a random selection; any false positives are flagged, and matching rules are adjusted as needed.

06 Dataset assembly and delivery

Dataset assembly and delivery

  • The enrichment engine merges validated emails back into the baseline records.
  • The final output is a roughly 100,000-contact, email-rich dataset delivered as a CSV or Google Sheet.
  • Typical turnaround time: about 1 day from the first query to the final file.

outcomes

01

80% of the contact list was enriched with tens of thousands of verified emails, quadrupling what tools like Hunter and Clearbit delivered.

02

Customers received complete, email-rich datasets within days of payment, boosting satisfaction and driving repeat orders.

Why HasData

01

Legally compliant

No scraping of login-protected or paywalled content — all emails are collected only from publicly accessible pages, ensuring full legal and terms-of-service compliance.

02

Automated at scale

100,000 rows enriched in about 1 day with zero manual effort.

03

Accuracy first

Fuzzy algorithm filters look-alike emails and false positives.

Who else
can benefit?

Lead-generation agencies and SDR platforms

Access to fresh, accurate, and scalable data is key to building high-converting lead lists.


Recruiting and staffing firms sourcing niche talent

Finding specialized candidates often means going beyond LinkedIn and job boards.


B2B data providers and enrichment SaaS products (like Apollo and Clay)

Data enrichment tools thrive on comprehensive, up-to-date information.


Marketing & sales-intelligence teams seeking higher outreach hit-rates

Up-to-date, relevant data on companies and individuals improves targeting accuracy.


Testimonial

Using HasData internally to drive enrichment at scale validated the product’s reliability. The Google SERP API handled scale without breaking, the fuzzy-matching kept false positives low, and the automation let us process ~100k records in a day with no manual overhead. It’s the same infrastructure that powers both our operations and the value we deliver to customers.

author

Sergey Ermakovich

Head of Marketing