Datasets Prices Documentation Blog

Is Web Scraping Legal? Yes, If You Do It Right

Sergey Ermakovich Sergey Ermakovich
Last update: 24 Mar 2025

Web scraping is legal when collecting publicly available data. If information is open to anyone without login or technical barriers, scraping it is generally allowed. No law explicitly bans web scraping, and many use it for research, business, and innovation.

However, the legality of web scraping depends on how you do it and how you use the data. In general, scraping is legal if you:

  1. Only collect publicly available data without logging in or breaking user agreements.
  2. Respect copyright laws and only use scraped content fairly. For example, for personal use or research.
  3. Do not collect private personal or sensitive information.

Public vs. Private Data

Before we dive into the laws around data scraping, let’s get one thing straight: the difference between public and private data.

Public data is available to everyone, no login or registration required, just open access. Search engines can index it, and you don’t need any special permissions to see it.

Private data, on the other hand, is behind some kind of barrier. If you need to log in, sign up, or subscribe to access it, that’s private data. It’s usually protected by passwords or other security measures, or it’s simply private user information. 

In most jurisdictions, scraping publicly accessible data is lawful, provided the process doesn’t violate related regulations, such as the U.S. Computer Fraud and Abuse Act (CFAA) or the European Union’s General Data Protection Regulation (GDPR).

Conversely, scraping private data almost universally violates legal standards. Such actions typically breach laws designed to protect privacy and data integrity, including the CFAA (U.S.), GDPR (EU), and similar regulations like the Personal Data Protection Act (PDPA, Singapore) or the Health Insurance Portability and Accountability Act (HIPAA) in the U.S.

So, what laws and regulations actually apply to working with digital data and determine whether you can collect it automatically using bots?   

Data Protection Laws

There are tons of laws protecting personal data, but in reality, it usually comes down to just two big ones:

  • GDPR (EU). The General Data Protection Regulation bans collecting personal data without a legal reason (consent, contract, legitimate interest, etc.).
  • CCPA (California, USA). Similar to GDPR but focuses more on consumer rights, like the right to opt out of data sales.

When people talk about web scraping and legality, these two are the ones that come up the most. But as you can probably tell from their descriptions, they don’t apply as long as you’re scraping only public data. Or if you have explicit permission from the platform.

Most other laws in this category either copy or extend these two, depending on the industry or country. I’ll cover those later in a more relevant section.

Privacy & Cybersecurity Law

This focuses on data security, preventing attacks, and liability for leaks. The key one is CFAA (USA, 1986, last updated 2021) – the Computer Fraud and Abuse Act, the main U.S. law against hacking and unauthorized access.

This law can be used against web scrapers if they bypass technical barriers to access data. So, something to keep in mind. 

This stuff is all about controlling who can use original content, ideas, inventions, and brands. The law protects four main types of intellectual property:

  1. Copyright. Covers original works like text, images, videos, and software.
  2. Patents. Protects inventions and technical solutions (think algorithms, devices).
  3. Trademarks. Defends brands, logos, and slogans (like Apple’s logo or Nike’s “Just Do It”).
  4. Trade Secrets. Keeps confidential business info safe (like Coca-Cola’s secret recipe).

The key laws shaping this protection:

  • DMCA (Digital Millennium Copyright Act, 1998, US). Lets copyright holders demand takedowns of illegal content (DMCA Takedown).
  • EU Copyright Directive (2019). Forces platforms like YouTube and Facebook to filter content for copyright violations.
  • Madrid System. A trademark registration system that works across 130+ countries.
  • WIPO (World Intellectual Property Organization). Develops global IP protection rules and resolves international disputes.

What isn’t protected by these laws? Public facts and data (exchange rates, weather, schedules), government documents (if public), and Creative Commons content. Pretty much everything else can be copyrighted.

When it comes to web scraping, its legality depends on your intentions and how you use the scraped data. You can scrape copyrighted content for research purposes as long as it follows fair use guidelines. However, you cannot publish or distribute that content under your name and without the copyright holder’s permission. Even if the data is for personal or educational use, it’s important to credit the original creator and not present the content as your own. Using copyrighted content for commercial purposes without consent is illegal.

Website Terms of Service (ToS) and Terms of Use (ToU)

Most websites have ToS and ToU that ban automated data collection. If you’ve ever checked a box without reading, you might’ve agreed to something like:

“You agree not to use automated tools (bots, scrapers) to access our data without written permission.”

There are two main ways websites get users to agree to their terms:

  • Clickwrap. This is when you have to physically click “I agree” before using a service. Since you’ve explicitly accepted the terms, courts tend to enforce them.
  • Browsewrap. This is when a website just says something like “By using this site, you agree to our terms.” somewhere in the footer. If users aren’t forced to click anything, courts are less likely to enforce it.

In many cases, ToS work like a contract: If you keep using the site, you’re agreeing to the rules, even if you never read them. Maybe this will change in the future, but right now, website owners have the right to ban automated data scraping in their terms of service.

On the other hand, breaking ToS and ToU isn’t always illegal. Courts in the US disagree on this:

  1. In some cases, violating ToS is just a civil issue, not a crime.
  2. In others (like bypassing security measures), courts ruled it does violate the CFAA (Computer Fraud and Abuse Act).

The biggest example is LinkedIn vs. hiQ Labs. LinkedIn tried to block hiQ from scraping public profiles using ToS as an argument. The court ruled that scraping public data doesn’t break the CFAA, even if ToS say otherwise.

So, violating ToS won’t always land you in legal trouble, but it can get your account banned, sued, or hit with other penalties.

Industry and National Regulations

When dealing with financial, medical, government, or other sensitive data, there are specific laws to consider. Some of them include:

  1. GLBA (Gramm-Leach-Bliley Act, US). Banks and financial institutions must protect customer data privacy.
  2. PSD2 (Payment Services Directive 2, EU). Requires banks to provide APIs for user data access, but only with user consent.
  3. HIPAA (Health Insurance Portability and Accountability Act, US). Strict rules for protecting medical data. Leaks can lead to fines or even criminal charges.
  4. ePrivacy. Regulates the use of cookies and online advertising.
  5. Cybersecurity Law (China). Tight restrictions on transferring data outside the country.
  6. PIPL (Personal Information Protection Law, China). Similar to GDPR but with strong government oversight.
  7. LGPD (Brazil). Brazil’s version of GDPR, covers personal data protection.

Laws vary by country and industry. If you’re working with data, you need to consider both national and industry-specific regulations to avoid legal trouble. 

That said, if the data you’re scraping isn’t related to finance or healthcare, doesn’t require login or a subscription, and isn’t coming from services based in China, you’re generally in the clear.

Notable Court Cases and Precedents

In the early days of the internet, courts were more likely to side with platforms and rule scraping illegal. This usually happened when scraping caused harm to a service. For example, eBay vs. Bidder’s Edge case, where the court ruled that scraping could be illegal if it overloaded a website’s servers.

But since 2015, there haven’t been any high-profile cases where scraping was ruled illegal. The biggest and most well-known lawsuits in this space were brought by LinkedIn and Meta. And in both cases, scraping was found to be legal.  

LinkedIn vs. hiQ Labs

In short, in HIQ Labs, Inc. v. Linkedin Corp., the court ruled that collecting publicly available data from LinkedIn does not violate the Computer Fraud and Abuse Act (CFAA) because the information was open to the public.

This case became a key legal battle over data scraping and antitrust issues. hiQ Labs, a small analytics company, used bots to collect publicly available LinkedIn profile data despite LinkedIn explicitly banning this in its terms of service.

In 2017, LinkedIn sent hiQ Labs a cease-and-desist letter, claiming the scraping violated multiple U.S. laws, including the CFAA. In response, hiQ Labs sued LinkedIn, seeking a court order to stop LinkedIn from blocking their access. The lower court sided with hiQ Labs, ruling that scraping public data does not violate the CFAA.

LinkedIn appealed, but in 2019, the U.S. Ninth Circuit Court of Appeals upheld the ruling, stating that accessing public data cannot be considered “unauthorized access” under the CFAA. In 2021, the U.S. Supreme Court sent the case back for review after a related ruling in Van Buren v. United States, which clarified CFAA interpretations. However, in April 2022, the appeals court once again ruled in favor of hiQ Labs.

Meta Platforms Inc. vs. Bright Data Ltd.

In Meta Platforms, Inc. v. Bright Data Ltd., a U.S. court ruled that scraping publicly available data from Facebook and Instagram doesn’t violate Meta’s Terms of Service, as long as it’s done without logging in. In other words, if the data is public, collecting it is legal. 

Bright Data used to have a licensing agreement with Meta, but in December 2022, Meta’s anti-scraping team told them to stop. Bright Data responded by shutting down its Meta accounts and canceling all contracts.

Meta argued that even after this, Bright Data was still breaking the rules by scraping public pages. They sued in California, and Bright Data fired back with a countersuit in Delaware.

The court made it clear: Meta’s Terms of Service apply to “registered users.” Since Bright Data no longer had accounts, they weren’t bound by those terms. Plus, they only scraped publicly accessible pages, no login required. That means the data wasn’t protected by authentication measures, so collecting it didn’t breach any agreement.

Meta tried to enforce a survival clause to extend the restrictions beyond contract termination. But the judge found it too vague, with no clear time or geographic limits, so it didn’t hold up.

As a result, the court ruled in favor of Bright Data, confirming that scraping public data doesn’t violate Meta’s terms.

Ryanair vs. PR Aviation 

PR Aviation scraped flight data from Ryanair’s website and used it for their own price comparison service. Ryanair sued them, claiming PR Aviation violated the site’s Terms of Service (ToS), which explicitly banned automated data collection.

The EU court ruled that if the data isn’t protected by copyright or the Database Directive (96/9/EC), scraping is generally allowed. But here’s the catch: If a user accepts the website’s ToS, they have to follow them.

In the end, Ryanair won. The court found PR Aviation in violation of the ToS, leading to a ban on further data collection and possibly some financial penalties.

Authors Guild vs. Google

Google scanned millions of books for its Google Books project, displaying small excerpts in search results. The Authors Guild sued, arguing this was a copyright violation.

But the court saw it differently. It ruled that Google’s large-scale book scanning was transformative use, meaning it fell under fair use protections.

Despite the Authors Guild’s complaints, Google won the case. The project was deemed beneficial to the public and wasn’t replacing the original books, so no copyright infringement.

Ethical Web Scraping

If you’re still worried about whether your scraper is ethical, let’s go over this one more time. There are only two clear cases where scraping can be a no-go: if it’s private data or copyrighted content. Everything else? Well, that depends.

To figure it out, let’s break down what actually impacts the ethics of scraping:

  1. Type of data. If it’s publicly available and not locked behind a password or some other restriction, you’re good to go.
  2. Service policies. If you’ve checked the Terms of Use, Terms of Service, and robots.txt, and nothing explicitly says you can’t scrape this specific page, or scrape in general, then you’re in the clear.
  3. Intentions. Believe it or not, why you’re scraping matters too. If you’re not stealing intellectual property, cloning an entire service, or poaching users, then you’re probably fine.
  4. Impact on the server. If you’re spacing out requests, avoiding overloading the server, and ideally scraping at off-peak hours, you’re being considerate, so, no ethical concerns here.

If you follow the site’s rules, stay away from personal and copyrighted data, and scrape responsibly without causing disruptions, then your scraping is as ethical as it gets.

Sergey Ermakovich
Sergey Ermakovich
I am a seasoned marketer with a passion for digital innovation. My expertise lies in creating innovative solutions to present data-driven insights, and I have extensive experience in the web scraping and data analysis industry.
Blog

Might Be Interesting