HasData
Back to all posts

Scrape Emails from Any Websites: From Python to AI Tools

Valentina Skakun
Valentina Skakun
Last update: 3 Jun 2025

Let’s go over a few ways to collect email addresses from websites. I’ll start with cases where you already have a list of sites you want to scrape, and later I’ll touch on how to expand beyond that list using Google Search or Google Maps. 

Email Scraping with Python

So, if you already have a list of websites and just need to scrape all available emails from them, the easiest way is to use an API that supports email extraction. We are using HasData’s web scraping API for this.

Code Example

Let’s jump straight into the code for those who don’t care about the details and just want something that works:

import requests
import json
import csv


api_key = "YOUR-API-KEY"
headers = {
    'Content-Type': 'application/json',
    'x-api-key': api_key
}


results = []


with open("urls.txt", "r", encoding="utf-8") as file:
    urls = [line.strip() for line in file if line.strip()]


for url in urls:
    payload = json.dumps({
        "url": url,
        "proxyType": "datacenter",
        "proxyCountry": "US",
        "jsRendering": True,
        "extractEmails": True,
    })


    try:
        response = requests.post("https://api.hasdata.com/scrape/web", headers=headers, data=payload)
        response.raise_for_status()
        data = response.json()
        emails = data.get("emails", [])


        results.append({
            "url": url,
            "emails": emails
        })


    except Exception as e:
        results.append({
            "url": url,
            "emails": []
        })


with open("results.json", "w", encoding="utf-8") as json_file:
    json.dump(results, json_file, ensure_ascii=False, indent=2)


with open("results.csv", "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["url", "email"])  
    for result in results:
        for email in result["emails"]:
            writer.writerow([result["url"], email])

Replace the API key with your own HasData key, which you can get after signing up. Also, make sure you have a file called “urls.txt” in the same folder as the script. This file should have the list of domains you want to scrape.

Scrape Emails from a Website using AI

The whole script works like this:

  1. Get a list of sites from urls.txt file.
  2. Loop through the list.
  3. For each site, call the API, parse the response, and pull out emails.
  4. After the loop, save everything as JSON and CSV files.

You can check out the full script above, so I won’t repeat it here. Instead, let’s talk about how you can modify the script if you want to use the same web scraping API to get not just emails, but any other contact info, like phone numbers.

The API can use an LLM model to extract data from the page based on a natural language description. Right now, after running the script, you get JSON and CSV files with emails like this:

Let’s add an AI extraction rule to the API request:

    payload = json.dumps({
        "url": url,
        "proxyType": "datacenter",
        "proxyCountry": "US",
        "jsRendering": True,
        "extractEmails": True,
        "aiExtractRules": {
            "address": {"description": "Physical address", "type": "string"},
            "phone": {"description": "Phone number", "type": "string"},
            "email": {"description": "Email addresses", "type": "string"},
            "companyName": {"description": "Company name", "type": "string"}
        }
    })

And handle the AI’s response when we get the JSON results:

        data = response.json()


        emails_list = data.get("emails", [])
        ai_resp = data.get("aiResponse", {})


        company = ai_resp.get("companyName", "-")
        address = ai_resp.get("address", "-")
        phone = ai_resp.get("phone", "-")
        email_ai = ai_resp.get("email", "")

Since we’re getting emails both from the standard API call and from the AI, we’ll combine them to avoid missing anything, just make sure to remove duplicates:

        all_emails = set(emails_list)
        if email_ai:
            all_emails.add(email_ai)


        email_combined = ", ".join(all_emails) if all_emails else ""

Also, we’ll need to change the CSV saving part a bit because the column names and content are hardcoded:

with open("results_ai.csv", "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["url", "company", "address", "phone", "emails"])
    for result in results:
        writer.writerow([
            result["url"],
            result["company"],
            result["address"],
            result["phone"],
            result["emails"]
        ])

The rest of the code stays the same, but here’s the full updated version just in case:

import requests
import json
import csv


api_key = "YOUR-API-KEY"


headers = {
    'Content-Type': 'application/json',
    'x-api-key': api_key
}


results = []


with open("urls.txt", "r", encoding="utf-8") as file:
    urls = [line.strip() for line in file if line.strip()]


for url in urls:
    payload = json.dumps({
        "url": url,
        "proxyType": "datacenter",
        "proxyCountry": "US",
        "jsRendering": True,
        "extractEmails": True,
        "aiExtractRules": {
            "address": {"description": "Physical address", "type": "string"},
            "phone": {"description": "Phone number", "type": "string"},
            "email": {"description": "Email addresses", "type": "string"},
            "companyName": {"description": "Company name", "type": "string"}
        }
    })


    try:
        response = requests.post("https://api.hasdata.com/scrape/web", headers=headers, data=payload)
        response.raise_for_status()
        data = response.json()


        emails_list = data.get("emails", [])
        ai_resp = data.get("aiResponse", {})


        company = ai_resp.get("companyName", "-")
        address = ai_resp.get("address", "-")
        phone = ai_resp.get("phone", "-")
        email_ai = ai_resp.get("email", "")


        all_emails = set(emails_list)
        if email_ai:
            all_emails.add(email_ai)


        email_combined = ", ".join(all_emails) if all_emails else ""


        results.append({
            "url": url,
            "company": company,
            "address": address,
            "phone": phone,
            "emails": email_combined
        })


    except Exception as e:
        results.append({
            "url": url,
            "company": "-",
            "address": "-",
            "phone": "-",
            "emails": ""
        })


with open("results_ai.json", "w", encoding="utf-8") as json_file:
    json.dump(results, json_file, ensure_ascii=False, indent=2)


with open("results_ai.csv", "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["url", "company", "address", "phone", "emails"])
    for result in results:
        writer.writerow([
            result["url"],
            result["company"],
            result["address"],
            result["phone"],
            result["emails"]
        ])

Here’s an example of the kind of data you can extract:

If you need to extract more data later, just add the description to aiExtractRules, and the LLM model will handle it automatically.

Extracting Emails with Regex in Python

If you don’t want to use an API, you can try a more hardcore approach and scrape emails yourself without any third-party tools.  

Code Example

Here’s the final version of the script:

import requests
import re
import csv


found_emails = set()
output_file = "found_emails.csv"
file_path = "urls.txt"


with open(file_path, "r", encoding="utf-8") as file:
    websites = [line.strip() for line in file if line.strip()]


email_pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}"
for website in websites:
    response = requests.get(website, timeout=10)
    if response.status_code == 200:
        emails = re.findall(email_pattern, response.text)
        for email in emails:
            found_emails.add((website, email))
    else:
        print(f"[{response.status_code}] {website}")


with open(output_file, "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["Website", "Email"])
    for website, email in found_emails:
        writer.writerow([website, email])

This approach can work, but it’s not reliable across all websites. To improve success, you need to add a headless browser instead of just requests. For better results, use a library that supports undetectable mode, like SeleniumBase or Undetectable Browser. After integrating the library, you’ll also need to configure a proxy and a CAPTCHA-solving service, making the setup significantly more complex than using a scraping API, like HasData’s. 

Regex to Extract Email Address from string

The script works similar to the one we looked at before:

  1. It reads a list of websites from urls.txt.
  2. Loops through each one.
  3. Uses regular expressions to pull out emails.
  4. After it’s done, it saves everything to a file.

The main part to focus on is the regular expressions (regex). If you’re not familiar with them, they’re patterns used to find specific bits of text. In this case, emails.

There’s an official standard for email formats called RFC 5322 (Internet Message Format):

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|
"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|
\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")
@
(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+
[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|
[a-z0-9-]*[a-z0-9]:
(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|
\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])

It’s super detailed, very complex, and honestly, no one uses the full version. Most of the time, a simpler version is good enough to catch nearly all real-world email addresses:

[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}

The one above is what we used in the script. 

Use a Contact Scraper to Get Emails Without Coding

This part of our guide is for those who need to scrape emails and other contact info from websites fast, without coding or messing with tools, just a simple file with the data.

You can use HasData’s Email Scraper. All you need is a list of sites to extract emails from:

After that, you can download the data as a CSV, JSON, or XLSX. Here’s an example of the data that you’ll get:

[
    {
        "url": "https://thecreganteam.com/",
        "xcom": [],
        "clutch": [],
        "emails": [
            "[email protected]"
        ],
        "dribbble": [],
        "facebook": [
            "https://www.facebook.com/thecreganteam"
        ],
        "linkedin": [
            "https://www.linkedin.com/in/john-cregan-218a608/"
        ],
        "instagram": [
            "https://www.instagram.com/lisa_and_john_palmbeach/"
        ],
        "phoneNumbers": [
            "+1.847.651.7210"
        ]
    },
  ...
]

This works if you need the data quickly and want more than just emails, but any contact info available.

Using Streamlit for Email Scraping

This last option is for those who not only need to scrape contact data from a site but also want help finding the right sites to begin with, whether through Google Maps, search results, or both. Or maybe the other methods didn’t work, and you just want something that works out of the box.

For that, we built an easy-to-use Email Scraping Tool

You can pick one of three methods:

  1. List of URLs. Use this if you already have a list of sites to pull emails from.
  2. Google SERP Keywords. Use this if you want to search sites by keyword and then scrape emails and contact info from them.
  3. Google Maps Keywords. Use this if you want to search for places on Google Maps and then retrieve emails and contact information from those locations. If that’s not enough, we wrote a separate guide on scraping emails from Google Maps.

To run the scraper, enter your HasData API key and either a list of websites or keywords, depending on the method you choose. Then, run the scraper:

When it’s done, you can copy the data table (or part of it) or download the data as JSON or CSV.

Other Tools: Browser Extensions and Desktop Crawlers Compared

Before we wrap up, it’s worth noting that there are a few other “no-API” roads people still take, like browser add-ons and old-school desktop crawlers. They both have their moments, but each comes with strings attached.

Browser extensions are the easiest win: most are free, live right in Chrome/Edge, and need zero setup. You click the icon, and any visible addresses on the current page pop out.

That simplicity is also a trap. You’re scraping in half-manual mode through your own browser tab. And at scale, this feels like panning for gold with a teaspoon, one page at a time, so pulling even a thousand emails is really difficult and time-consuming.

Stand-alone desktop crawlers feel more “pro.” You point them at a URL list, hit Start, and they process pages without requiring code. It’s nice until you realize they run from a single residential IP, hit CAPTCHAs after a few hundred requests, and need constant rule tweaks each time a site changes its markup.

Cross-platform support is unreliable, licenses can add up, and scaling requires additional hardware or renting virtual private servers (VPSs). In practice, desktop crawlers are suitable for small projects, but when you need reliable throughput and compliance safeguards, you’ll likely switch to a service that handles the complexities for you.

Before scraping or emailing, ensure you have a lawful basis for data processing. Under GDPR and CCPA, this typically means obtaining explicit consent or demonstrating “legitimate interest.” In the U.S., CAN-SPAM requires clear subject lines, a physical address, and an easy opt-out option in each email. 

Violations can lead to hefty fines. Mass-harvesting emails can also breach site terms, damage sender reputation, and be unwelcome. Collect only necessary contact information, respect opt-outs, and use the data responsibly.

When to Use Each Email Scraping Method (Quick Guide)

Let’s wrap this up with a quick guide on when to use each email scraping method we covered:

MethodCode NeededSetup DifficultyExtractsBest For
Regex Script (Python)YesMediumEmails onlyDevs scraping basic pages
HasData APIYesEasyEmails + address + phoneDevs who want scale and accuracy
Streamlit ToolNoEasyEmails + company + social linksNon-coders who want quick results
Browser ExtensionNoVery EasyVisible emails onlyBeginners doing manual small-scale work
Desktop CrawlerNoMedium/HardEmails (limited at scale)Offline batch scraping (limited volume)

That’s it! We’ve gone through each of these methods here, so you can jump back to whichever one fits your needs best.

Valentina Skakun
Valentina Skakun
I'm a technical writer who believes that data parsing can help in getting and analyzing data. I'll tell about what parsing is and how to use it.
Articles

Might Be Interesting