HasData
Back to all posts

Scrape ZoomInfo Without Blocking (2025)

Valentina Skakun
Valentina Skakun
Last update: 10 Dec 2025

The most reliable way to scrape ZoomInfo is extracting the hidden JSON object (ng-state) using SeleniumBase in UC Mode. 

Standard requests will fail due to TLS fingerprinting, and standard CSS selectors are brittle. 

Below is a complete, copy-pasteable solution using SeleniumBase to avoid the “Press & Hold” captcha and extract the hidden JSON that works the same for profile pages, company pages, and search pages.

from seleniumbase import SB
from selenium.webdriver.common.by import By
import time, json, random

# Base URL for the page (search, person, or company)
base_url = "https://www.zoominfo.com/people-search/industry-health-services-extra-eyJtYW5hZ2VtZW50TGV2ZWwiOlsiRGlyZWN0b3IiXX0%3D"  # or person/company URL
pages = 5  # for search pages set max 5 pages; for single profile/company set 1
all_data = []

with SB(uc=True) as sb:
    for page in range(1, pages + 1):
        url = f"{base_url}?pageNum={page}" if pages > 1 else base_url
        sb.uc_open_with_reconnect(url, 4)
       
        # Random delay to mimic human behavior and reduce anti-bot detection
        time.sleep(random.uniform(1, 3))
       
        try:
            # Find all JSON scripts containing page data
            scripts = sb.find_elements('script[type="application/json"]', by=By.CSS_SELECTOR)
           
            for el in scripts:
                # Extract and parse JSON content
                content = el.get_attribute("innerHTML")
                data = json.loads(content)
               
                # If JSON is empty, likely a temporary block/honeypot
                if not data:
                    print(f"Page {page}: Empty JSON, wait 30-60s and retry")
                    continue
               
                # Remove internal keys we don't need
                data.pop("__nghData__", None)
                data.pop("cta_config", None)
               
                # Store the cleaned data
                all_data.append(data)
       
        except Exception as e:
            # Catch any errors for this page
            err_msg = str(e).lower()
            if "429" in err_msg:
                print(f"Page {page}: Rate limited (429), wait 30-60s and retry")
                time.sleep(random.uniform(30, 60))
            elif "403" in err_msg:
                print(f"Page {page}: Forbidden (403), switch IP/proxy and retry next day")
            elif "503" in err_msg:
                print(f"Page {page}: Service unavailable (503), ZoomInfo outage, retry after 5 min")
                time.sleep(random.uniform(30, 60))
            else:
                print(f"Page {page} unknown error:", e)
                time.sleep(random.uniform(30, 60))

# Save all collected data to a JSON file
with open("zoominfo_data.json", "w", encoding="utf-8") as f:
    json.dump(all_data, f, ensure_ascii=False, indent=2)

Why This Method Works

Most ZoomInfo tutorials suggest parsing HTML elements like <h1 class="person-name">. This is a bad idea for several reasons:

  1. It’s brittle. ZoomInfo can change class names at any time and break your scraper.
  2. It’s incomplete. The HTML view shows only a small part of the real data.
  3. It’s heavy. The JSON has 199 fields for companies, 161 for profiles, and 61 for search pages. Mapping all of that through CSS selectors is slow and painful.
  4. It’s slower. You load one full JSON object in one step instead of scanning the DOM for many selectors.
  5. It’s hard to maintain. Large, deep DOM structures make long-term scraping fragile.

Instead, ZoomInfo hydrates its frontend using a massive JSON object embedded in a <script id="ng-state"> tag. This object contains structured data for profiles, companies, and search results.

By targeting this JSON, you get the full dataset without dealing with unstable HTML.

Prerequisites and Setup

ZoomInfo blocks basic HTTP requests with strict fingerprint checks (PerimeterX), so tools like requests, httpx or even headless browsers like Selenium or Playwright fail. SeleniumBase in UC Mode avoids these blocks and loads each page like a normal user. 

Install SeleniumBase with:

pip install seleniumbase

It installs Selenium automatically.

Understanding ZoomInfo Pages

There are three main page types:

  1. Search pages: 

https://www.zoominfo.com/people-search/<filters>

  1. Person profile pages: 

https://www.zoominfo.com/p/<First-Last>/<id>

  1. Company profile pages: 

https://www.zoominfo.com/c/<company-name>/<id>

Each page has JSON in:

<script id="ng-state" type="application/json">

You can scrape only the first five search result pages without logging in. To get more data, break your search into smaller groups using different filters (e.g., by City or ZIP code) so each set returns fresh results.

The JSON contains people’s names, job titles, profile links, and company links. Emails and phone numbers are missing, photos are often empty, and many fields like location, title, or metadata may be blank or null. From this JSON, you can collect links to individual profiles and companies for deeper scraping.

A person profile page contains the full ZoomInfo profile for an individual.

Person page JSON contains the full name, job title, biography, photo, masked email and phone, work and education history, employer information, profile and external links, similar profiles, colleagues, web mentions, and AI-generated signals. Most data is complete even without login, but some contact fields are masked and certain blocks may be empty or simplified.

A company profile page contains nearly all available ZoomInfo data.

Company page JSON contains the legal name, company size, number of employees, technologies used, financial summary, competitors, executives, address, social links, news, acquisitions, org charts, email patterns, hiring trends, intent signals, awards, and comparative analytics. Some data is partially hidden or incomplete: employee contacts are masked, historic financial numbers may be partial, email patterns show examples only, intent data is summarized, and fields like company details, tech used, or acquisitions may be missing or cut short.

Scraping Search Pages

All three ZoomInfo scrapers work the same way:

  1. Set the URL.
  2. Open the page in undetectable mode.
  3. Scrape pages from the list.
  4. Clean the JSON (remove unnecessary fields).
  5. Save to JSON file.

To scrape ZoomInfo search results, loop through the first five pages. 

from seleniumbase import SB
from selenium.webdriver.common.by import By
import time, json, random


# Base URL for the people search
base_url = "https://www.zoominfo.com/people-search/industry-health-services-extra-eyJtYW5hZ2VtZW50TGV2ZWwiOlsiRGlyZWN0b3IiXX0%3D"
all_data = []


with SB(uc=True) as sb:
    for page in range(1, 6):
        # Open each search results page
        url = f"{base_url}?pageNum={page}"
        sb.uc_open_with_reconnect(url, 4)
        # Random delay to mimic human behavior and reduce anti-bot detection
        time.sleep(random.uniform(1, 3)) 
 
        try:
            # Find all JSON scripts
            scripts = sb.find_elements('script[type="application/json"]', by=By.CSS_SELECTOR)
            for el in scripts:
                content = el.get_attribute("innerHTML")
                data = json.loads(content)
                # Remove unnecessary keys
                data.pop("__nghData__", None)
                data.pop("cta_config", None)
                all_data.append(data)
        except Exception as e:
            print(f"Page {page} error:", e)


# Save all collected data in one JSON file
with open("search_data.json", "w", encoding="utf-8") as f:
    json.dump(all_data, f, ensure_ascii=False, indent=2)

Data from all pages is saved in one JSON. To save each page separately, move the file writing inside the loop. 

Scraping Person Profile Page

Use the same ZoomInfo scraping script but remove the loop since it’s a single page: 

from seleniumbase import SB
from selenium.webdriver.common.by import By
import time, json, random


# URL of the individual profile page
base_url = "https://www.zoominfo.com/p/Jason-Brosious/6548475613"
all_data = []


with SB(uc=True) as sb:
    # Open profile page using undetected Chrome
    sb.uc_open_with_reconnect(base_url, 4)
    # Random delay to mimic human behavior and reduce anti-bot detection
    time.sleep(random.uniform(1, 3))  


    try:
        # Extract all JSON scripts
        scripts = sb.find_elements('script[type="application/json"]', by=By.CSS_SELECTOR)
        for el in scripts:
            content = el.get_attribute("innerHTML")
            data = json.loads(content)


            # Remove unnecessary internal ZoomInfo keys
            data.pop("__nghData__", None)
            data.pop("cta_config", None)


            all_data.append(data)


    except Exception as e:
        print("Error:", e)


# Save the collected profile JSON data
with open("profile_data.json", "w", encoding="utf-8") as f:
    json.dump(all_data, f, ensure_ascii=False, indent=2)

Since this page has personal data, check our article about the legality of web scraping before extracting ZoomInfo profile data.

Scraping a Single Company Profile

The same script works. Replace the URL with the company page:

from seleniumbase import SB
from selenium.webdriver.common.by import By
import time,json, random


# URL of the company page
base_url = "https://www.zoominfo.com/c/google-llc/16400573"
all_data = []


with SB(uc=True, test=True) as sb:
    # Open company page using undetected Chrome
    sb.uc_open_with_reconnect(base_url, 4)


    # Random delay to mimic human behavior and reduce anti-bot detection
    time.sleep(random.uniform(1, 3))    


    try:
        # Find all JSON script tags
        scripts = sb.find_elements('script[type="application/json"]', by=By.CSS_SELECTOR)
        for el in scripts:
            content = el.get_attribute("innerHTML")
            data = json.loads(content)


            # Remove internal keys not needed
            data.pop("__nghData__", None)
            data.pop("cta_config", None)


            all_data.append(data)


    except Exception as e:
        print("Error:", e)


# Save all extracted company JSON data to a file
with open("company_data.json", "w", encoding="utf-8") as f:
    json.dump(all_data, f, ensure_ascii=False, indent=2)

To save ZoomInfo data as CSV or XLSX, use pandas to convert JSON to a DataFrame.

ZoomInfo Anti-Scraping Measures

ZoomInfo uses enterprise-grade protection (Cloudflare & PerimeterX) that analyzes both your network reputation and browser fingerprint. A simple 200 OK doesn’t always mean success.

Here is how to interpret and handle the response codes:

Status CodeMeaningRecovery
200SuccessContinue/Check if empty
429Rate limitedWait 30-60s, then retry
403Forbidden (IP blocked)Switch IP/proxy, retry next day
503Service unavailableZoomInfo outage, retry after 5 min
Empty 200Honeypot (looks like 200 but no data)Switch IP
Redirect to /errorDetected scraperAdd delay, rotate proxy

Scraping can trigger errors, captchas, refusals, or even bans. To stay safe, use tools that mask automation, add short random pauses between requests, and increase the delay if errors appear.

Here is a simple visual summary of the rules you should follow:

While SeleniumBase UC Mode successfully masks your Browser Fingerprint (Canvas, WebGL, Navigator), it cannot hide your Network Layer (IP Reputation).

ZoomInfo’s WAF (PerimeterX) performs real-time ASN (Autonomous System Number) lookups on every request. If your request originates from a known Datacenter ASN (AWS, Google Cloud, Azure, DigitalOcean), it is flagged as Non-Human and rejected immediately with a 403 Forbidden or an infinite CAPTCHA loop.

To bypass this, you must route traffic through a Residential Proxy Pool. These IPs belong to legitimate ISPs (Comcast, Verizon, AT&T), making your traffic indistinguishable from a real user.

SeleniumBase accepts the proxy argument directly in the constructor. Ensure your proxy string includes authentication if required.

# Format: "username:password@ip:port" or "ip:port"
# Note: Do not use 'http://' prefix inside the SB context argument
proxy_string = "user123:[email protected]:8000"


with SB(uc=True, proxy=proxy_string) as sb:  # <-- add proxy here

Author Tip: Do not rotate IPs during a session (e.g., between the search page and the profile page). ZoomInfo tracks session tokens against the IP. Rotate the proxy only when initializing a new browser instance.

Takeaways

Scraping ZoomInfo is less about “parsing HTML” and more about evading detection. By shifting your strategy from DOM selectors to hidden data extraction, you solve the two biggest pain points: stability and maintenance.

  • Architecture Wins. The <script id="ng-state"> JSON blob provides a stable API-like structure that doesn’t break when ZoomInfo changes its CSS classes.
  • Decoupling. Always separate your extraction logic (getting the JSON) from your parsing logic (cleaning the data). This allows you to update your parser without re-running the expensive scraping process.

If maintaining a headless browser fleet and rotating residential proxies becomes a bottleneck for your team, consider outsourcing the infrastructure to a dedicated scraping API that handles Cloudflare bypasses automatically.

Valentina Skakun
Valentina Skakun
Valentina is a software engineer who builds data extraction tools before writing about them. With a strong background in Python, she also leverages her experience in JavaScript, PHP, R, and Ruby to reverse-engineer complex web architectures.If data renders in a browser, she will find a way to script its extraction.
Articles

Might Be Interesting