Skip to main content
Piloterr
Back to blog
August 21, 2023

Is Web Scraping Legal or Illegal?

Web scraping is not illegal by itself. Collecting publicly available data for analysis, research, or internal business use is widely accepted, but legality depends on what you scrape, how you scrape it, and what you do with the data. Since GDPR took effect in 2018, companies working with EU personal data have had to think harder about compliance. Below we cover the legal landscape, six practical rules for staying on the right side of the law, and the court cases that shaped the debate.

Before diving into the legal aspects, let's briefly cover what web scraping is and where it is used.

What is web scraping?

Web scraping is a technique that is used to collect content in the form of data from the internet, and usually saved in a local file so that it can be manipulated and analyzed as per needed, web scraping can be used for various purposes such as extracting product information, customer reviews, news articles, social media posts and so on. It requires two parts, a crawler and a scraper. Web Crawler is an algorithm that is used to browse the web to search for particular data that is required, by following the links across the internet, while scraper is a tool that extracts the data from website’s HTML code, and outputs that extracted data in a structured format as well. It can be an easy and challenging task at the same time, some challenges that can be faced by scrapers listed here.

Challenges of Web Scraping

Anti-scraping mechanisms :

Several websites employ anti-scraping measures to prevent web scraping bots, including CAPTCHAs, IP blocking, honeypot traps, dynamic content, and some even prevent scraping by implementation of Login Requirements. Web scrapers need to use various techniques to bypass these obstacles or anti-scraping mechanisms. The main techniques to bypass are like,

Large Proxy infrastructures :

Web Scrapers need to use a proxy to hide their real IP address to avoid being detected or blocked by the website. However managing a large number of proxies can be costly and complicated at the same time, web scrapers need to choose reliable and ethical proxy providers that can offer high quality diverse IP addresses to them.

Geo-Specific Scraping :

Some websites do not allow access from specific certain regions or display different content based on the user’s location. Web scrapers need to use a geo-targeted proxy or Virtual Proxy Network (VPN) to access those websites and get the desired data from them.

Website Structure Changes :

Websites often change their content and layout as well to improve user experience or to add new features. This can affect the scraper’s ability to extract data from html code. Web scrapers need to monitor these changes and need to update their scraping abilities according to them.

Large Scale Scraping or Distributed Scraping :

When web scrapers require large amounts of data or need to extract data from multiple websites, they need to use distributed systems that can handle concurrency, scalability, fault tolerance and load balancing techniques as well. Scrapers also need to respect the website’s crawler rate limitations in order to avoid overloading to the required website’s servers.

Quality of the data :

The output data can result in incomplete, inaccurate, outdated or even irrelevant data that can be extracted if the scraping is not done properly. Web scrapers need to ensure that the extracted data is from reliable sources, and they have to validate and clean up data and remove the irrelevant part before storing that output data in a structured format to avoid inconvenience in future.

Tools used in web scraping :

There are many tools that are used to scrap the web based data depending on scrapers preference, needs and skill sets. Some most used scraping tools are:

  1. Piloterr : this is an API that handles proxies, browsers and CAPTCHA for the scrapers. This API can be used with any programming language or framework as per needed.
  2. Scrap Box : this is a desktop software specially designed for web scrapers. This allows you to scrape websites by providing various tools like keyword scraper, link extractor, email scraping etc.
  3. Screaming Frog : this desktop software crawls websites and audits them for added benefits of SEO purposes. You can use it to extract meta-data like titles, meta tags, images, hyperlinks and others.
  4. Scrapy : it is an open-source framework to scrape data from the web and crawl using python language. This tool is used to create spiders that can scrape data from multiple websites at the same time.
  5. Pyspider : it is also an open-source tool or framework for python with added benefits of a web-based UI which allows you to write scripts, monitor tasks and even debug errors as well.
  6. Beautiful Soup : it is also an open-source library for scrapers that pursues HTML and XML documents in python, this can be used to extract data from websites using methods like CSS selectors or regular expressions as per required.
  7. Diffbot : Diffbot is an API that uses computer vision and natural language processing to extract structured data from any kind of website, this tool can be used with all kinds of programming languages or frameworks.
  8. Common Crawl : it is also an open-source project that crawls the large-scale web data and gives you raw HTML data that is available to access and analyses according to requirements of scrapers. It can be used to obtain data from millions of websites without the hectic process of scraping them yourself.

Importance of Web Scraping

Web Scraping allows you to access and analyze large amounts of data from various websites, the reasons that make this process important are:

Automation

Web scrapers can automate the process of extraction of data from different websites, which help them to save some time and resources. These tools and APIs can collect large amounts of data with just one click.

Cost-Effectiveness

Web Scraping can reduce the cost of data acquisition by eliminating the need for manual data entry or even hiring a workforce that can be too costly for some organizations. You can use web scraping to obtain data that is otherwise, either not available for the public or is too costly to access that data.

Easy Implementation

Web Scraping can be easily implemented by using various tools and techniques that depend solely on your preference and skill sets. You can use web scraping software, frameworks, libraries or APIS to extract web data using any programming language or framework of your choice.

Low Maintenance

If you are using a reliable scraping tool or service, that will help you to minimize the efforts of maintenance required for data mining. you can monitor website changes, handle errors and update your scrapers accordingly.

Speed

Web Scraping can extract data from websites at a fast rate especially if you are using a distributed system that can handle concurrency and scalability. You can use it to obtain large chunks of data with bare minimum time needed.

Data Accuracy

Web Scraping tools extract data directly from the source website. This ensures data accuracy, you can use web scraping techniques such as regular expressions or CSS selectors to validate data and clean it before storing in a structured format.

Effective Management of Data

Web Scraping can be helpful to effectively manage data by allowing you to export in various formats like CSV, JSON, XML or whatever you want. You can also use it to integrate data with other sources, databases or APIs as well.

Innovation

Web Scraping can enable innovation by allowing you to create new products and services based on the data you mine. You can use it to obtain insights of your local market, customers and competitor information, look for local trends and watch the market closely.

Legal aspects of web scraping

In simple terms, web scraping is not illegal in itself. Publicly available data can generally be collected and used: but scrapers can face legal issues depending on what they collect and how they use it. The main risks are:

Breach of contract

Many websites prohibit scraping in their terms of service and restrict how their data may be used. Violating those terms can expose you to civil lawsuits for breach of contract: even when the data itself is public.

Copyright infringement

Websites often copyright their content. Scraping text, images, or databases and republishing them without permission can trigger copyright infringement claims. Extracting facts is usually fine; republishing creative expression is not.

Computer Fraud and Abuse Act (CFAA)

This US federal law prohibits unauthorized access to computers and networks. After the Supreme Court's Van Buren v. United States (2021) ruling, CFAA applies mainly when you bypass technical access controls: not when you scrape data that is openly visible without logging in.

Trade secrets

Scraping confidential or proprietary information: customer lists, pricing algorithms, internal documents, and sharing it with others can lead to trade secret misappropriation claims.

Data protection regulations

Personal data is regulated separately from scraping itself. In the EU, the GDPR applies; in California, the CCPA. Collecting names, emails, or phone numbers without a lawful basis or consent can result in significant fines regardless of how the data was obtained.

6 rules for legal and compliant web scraping

Whether you scrape for market research, recruitment, or competitive intelligence, these six rules keep you on solid ground:

1. Scrape for a legitimate purpose

Collect data for your own analysis or internal use: not to republish it, harm the source site, or cause financial or reputational damage to its owner. Republishing scraped content commercially almost always requires permission from the copyright holder.

2. Stick to publicly available data

Only collect information that any visitor can see without logging in or bypassing a paywall. Data behind authentication walls, access codes, or subscription gates is not "public" in a legal sense, even if you can technically reach it.

3. Respect copyright

Before copying text, images, trademarks, or database contents, check whether they are protected. You can generally reuse facts and transform data into an original format; you cannot republish copyrighted material without consent.

4. Control your scraping rate

Aggressive scraping can overload servers and get your IP blocked. Check the site's robots.txt for Crawl-delay directives. When none is specified, a safe default is roughly one request every 10–15 seconds. Ignoring robots.txt is not illegal in most jurisdictions, but it is considered bad practice and often leads to blocks.

5. Follow the same path as a normal visitor

Access pages the way a search engine crawler would: through public URLs, without breaking site structure or interfering with normal operation. This reduces the risk of both technical disruption and ToS violations tied to unauthorized access methods.

6. Identify your scraper

Set an honest User-Agent string that includes your organization name, a contact URL or email, and a brief description of your activity. Transparency makes it easier for site owners to reach you and often prevents escalation to legal action.

CFAA

Computer fraud and Abuse Act or CFAA is a US federal law that prohibits unauthorized access to computers or networks. This act started back in 1986 as an amendment to existing computer fraud law which had been included in the comprehensive crime control act of 1984. CFAA covers various kinds of cyber and computer-based crimes and offenses like obtaining national security information, accessing a computer to obtain information, trespassing in a government computer, accessing a computer to defraud or obtain value, internationally or recklessly damaging by knowledge transmission, trafficking in passwords or similar things like this. CFAA also provides precautions and remedies for victims that faced some kind of computer or cyber crimes as well. This law has been criticized widely for being vague, broad and outdated, while it has been amended several times over years to address new forms of cybercrimes and implementations of new technology like AI.

GDPR

GDPR is an EU law that regulates the collection and processing of personal data belonging to individuals in the EU or EEA. It applies to organizations inside and outside the EU. GDPR gives individuals control over their personal data and imposes penalties on non-compliant organizations. Scraping itself is not banned, but using scraped personal data: for example, harvesting names and emails to generate leads without consent: is restricted. Key requirements for scrapers:

  • Lawful basis : web scraping must have a valid legal reason for collecting and using personal data, GDPR provides six possible lawful bases which are consent, contract, legal obligation, vital interest, public interest and legitimate interest. Web scrapers need to determine any one of these bases that is applied for their activity and document it accordingly.
  • Transparency : Web scraping needs to be transparent and inform individuals about how their personal data is collected and where it will be used. GDPR requires web scrapers to provide clear and concise information about their identity, purpose of mining data, legal basis, recipients, retention period, individual rights etc.
  • Data minimization : web scrapers must limit the collection and use of personal data that is relevant and necessary for specific purposes only. GDPR requires web scrapers to limit their data extraction to what is adequate and proportionate to the objectives.
  • Data Quality : Web scraping must ensure that personal data is accurate and always up-to-date as well. GDPR requires web scrapers to correct and delete any inaccurate data without any delay.
  • Data Security : Web scraping must protect personal data from unauthorized access or loss of personal data. GDPR requires the implementation of appropriate technical and organizational measures to ensure a level of security that matches the risks involved in processing of personal data.
  • Data protection impact assessment (DPIA) : Web scrapers need to conduct a DPIA if involved high-risk in processing of personal data. DPIA is a systematic process that evaluates the impact of processing individual rights and freedom, and even identifies measures to mitigate these risks as well.

GDPA

General Data Protection Act (GDPA) is a Brazilian law that regulates personal data of individuals in Brazil, it regulates how this data is collected and processed, and even protects data inside and outside Brazil similar to GDPR.

Terms of Service

Terms of Service (ToS) are a legal agreement between website owners and users. For scraping, the relevant clauses are those that restrict automated access or limit how data may be used.

These terms and conditions matter because they can affect the civil liability of your scraping activity. Violating ToS does not automatically make scraping criminal, but site owners can sue for breach of contract. When in doubt, request written permission: especially for commercial use.

Notable platform policies:

  • Ryanair explicitly prohibits commercial scraping unless you have a written license agreement.
  • LinkedIn prohibits scraping profiles via crawlers, browser plugins, or any automated means: though courts have ruled that scraping public profiles does not violate the CFAA (see case study below).
  • Amazon requires written permission before using robots, spiders, or scrapers on its services.
  • Meta (Facebook, Instagram) prohibits automated data collection without prior permission.
  • X (Twitter) restricts access to its officially supported API and interfaces.
  • YouTube limits access to its own provided tools and interfaces.

Ethical Uses of web scraping

Web scraping is not considered illegal when done ethically. It means when you scrape data that is publicly available, not protected or restricted by any kind of laws and regulations, and is used for beneficial and legitimate purposes only. Some ethical use case scenarios of web scraping are:

  • Scraping data for academic research and educational purposes.
  • Scraping for market analysis or business intelligence.
  • Scraping for content aggregation and news curation.
  • Scraping for SEO or web analytics.

Prohibited or Illegal Use of Web Scraping

Web Scraping gets illegal when used for unethical purposes, like publishing the collected data to harm someone, or trying to mine confidential or not-so publicly available data that is prohibited for a reason. Some examples of illegal use cases of web scraping are:

  • Scraping personal data like names, emails, phone numbers or contact information without consent or compliance with data protection regulations, GDPR or CCPA.
  • Scraping copyrighted content like Books, Images, Articles, Music etc. without permission from the owner for fair use.
  • Scraping confidential or proprietary information like trade secrets, business strategy, customers list or so on, without the authorization from the relevant business group.
  • Scraping data by bypassing security measures like CAPTCHA, IP blocking, Login and others, or violating CFAA and other laws.
  • Scraping data by violating Terms of Service or robot.txt file that prohibits or limits web scraping.
  • Scraping data by overloading the web server or disrupting functionality of a website.
  • Scraping data for spamming, phishing, fraudulent activities, identity theft and cyberattacks etc.

Case studies

Below are notable legal disputes involving web scraping, illustrating how courts have ruled on public data, ToS, and the CFAA.

HiQ Labs vs LinkedIn

HiQ Labs scraped publicly visible LinkedIn profile data to provide analytics services to employers. LinkedIn sent a cease-and-desist letter and blocked access, arguing CFAA and ToS violations.

The case went through multiple rounds:

  • 2019: The Ninth Circuit ruled that scraping publicly available data does not violate the CFAA.
  • 2021: The Supreme Court vacated that ruling after Van Buren v. United States, which narrowed CFAA to unauthorized access, not ToS violations alone.
  • 2022: The Ninth Circuit reaffirmed that HiQ could scrape public profiles. LinkedIn's petition for Supreme Court review was denied.

In the end, courts found that hiQ had violated LinkedIn's user terms of service: but left no definitive ruling on when scraping itself is illegal. HiQ went out of business before the dispute was fully resolved.

Takeaway: Scraping public data is generally not a CFAA crime in the US, but violating a platform's ToS can still lead to civil breach-of-contract claims. LinkedIn's ToS explicitly prohibits scraping even when courts won't treat it as hacking.

LinkedIn vs Proxycurl and ProAPIs (2025)

LinkedIn's enforcement campaign did not stop with hiQ. As Bloomberg Law reported in December 2025, the platform has intensified its legal and technical fight against bot scrapers: especially as AI tools make large-scale extraction easier to run with fewer engineers.

Two recent cases illustrate the shift:

  • Proxycurl (2025): LinkedIn sued the Singapore-based startup for creating fake accounts to scrape profiles at scale. Proxycurl shut down in July 2025 rather than continue the fight in court.
  • ProAPIs (2025): In October, LinkedIn sued ProAPIs, alleging millions of fake accounts and scraping software marketed at hundreds of requests per second. The case (LinkedIn Corporation v. ProAPIs Inc, N.D. Cal., No. 3:25-cv-8393) was exploring an early settlement as of late 2025.

LinkedIn's filings describe a cat-and-mouse pattern: fake accounts are often detected within about a day, but each can scrape dozens of profiles before being restricted, and new accounts replace blocked ones faster than they can be caught.

What changed: Unlike the hiQ era, LinkedIn's recent wins rely less on CFAA arguments and more on fake account creation, ToS violations, and breach of access controls. Courts have also sided with scrapers when only publicly available data was collected (as in Bright Data's 2024 victory against Meta) but claims involving fake logins or password walls remain much harder to defend.

For scrapers, the lesson is clear: LinkedIn actively pursues large-scale operations, and the legal landscape around profile scraping (especially for AI training or resale) remains unsettled.

Meta Inc. vs BrandTotal LTD and Unimania Inc.

Two companies used browser extensions to scrape data from Meta platforms (Facebook, Instagram, Twitter, YouTube, LinkedIn, Amazon) without authorization. Meta sued for ToS violations and unauthorized data access. The case settled in 2022 with a permanent injunction and a significant financial penalty.

Ryanair Limited vs PR Aviation

PR Aviation scraped flight information from Ryanair's website to offer price comparisons. Ryanair sued for ToS and database protection violations. The EU Court of Justice ruled in Ryanair's favor in 2015, confirming that website owners can contractually restrict third-party scraping of their data.

Conclusion

Web scraping is legal when you collect publicly available data for legitimate purposes. To stay compliant:

  • Avoid scraping personal data protected by GDPR, CCPA, or similar laws without a lawful basis.
  • Respect copyright: extract facts, don't republish protected content.
  • Follow the site's ToS and robots.txt, throttle your requests, and identify your scraper.
  • Never bypass login walls, CAPTCHAs, or other access controls to reach restricted data.

Scraping becomes illegal when you use the data for fraud, spam, or harm: or when you violate copyright, data protection rules, or trade secret laws. When in doubt, consult a lawyer familiar with the jurisdictions where you operate and where the target site is hosted.

More to read

Guides and news about web scraping, proxies, and data extraction.

Scraping

How to Scrape Company Salary Data with Python

Learn to scrape Comparably salary data with Python and Piloterr. Complete tutorial with code, Angular handling, and structured JSON extraction.

Josselin Liebe
Josselin Liebe
Read
Scraping

Puppeteer: Node.js Web Scraping Library for JavaScript

Learn web scraping with Puppeteer Node.js - Complete guide with practical examples for scraping e-commerce sites, social media, React/Vue SPAs. Advanced browser automation techniques, JavaScript handling, bot detection avoidance. 2025 developer tutorial.

Josselin Liebe
Josselin Liebe
Read
Scraping

How to Build a Company Employee Dataset

In this tutorial, we’ll learn how to leverage the precision of Google Dorks and the automation power of Piloterr APIs to collect public LinkedIn profile data. The final result is a structured .json dataset ready for analysis.

Harivony Ratefiarison
Harivony Ratefiarison
Read

Ready to get started?

Your web scraping API is one click away. Start with +500 credits, no infrastructure to set up, no proxies to manage, and no credit card required.

  • +500 credits
  • No credit card required
  • All endpoints included