Top List of User Agents for Web Scraping and Implementation Tips
When scraping large amounts of information, the main problem is the risk of blocking and how to avoid it. We have already discussed that you can use captcha-solving services, proxies, or even a web scraping API that takes care of your difficulties.
However, suppose you are collecting data by making simple HTTP requests and want to create your scraper entirely. In that case, you cannot do without using headers in general and User-Agents in particular.
In this article, we will tell you what User Agents are, why they are needed, what they mean, and where to get them. In addition, we will provide code examples for both setting and rotating User Agents in Python and NodeJS.
What is User-Agent String
User-Agent is a string a web browser sends to a server when requesting a web page. It contains information about web browsers, operating systems, and devices.
Discover the easiest way to get valuable SEO data from Google SERPs with our Google SERP Scraper! No coding is needed - just run, download, and analyze your SERP data in Excel, CSV, or JSON formats. Get started now for free!
Effortlessly extract Google Maps data – business types, phone numbers, addresses, websites, emails, ratings, review counts, and more. No coding needed! Download results in convenient JSON, CSV, and Excel formats.
Regularly changing the User-Agent and proxy is a crucial strategy to avoid blocking in web scraping. By changing the user agent header, you can emulate different devices and browsers, making detecting and blocking automated scraping requests harder for websites.
The Importance of User Agents in Web Scraping
User-Agents play a crucial role in web scraping, enhancing the scraping process, and avoiding detection and blocking. This section explores why you should use User-Agents in your scraping scripts.
Avoiding IP blocking
Not all websites are bot-friendly. Many websites have implemented anti-bot measures to protect their content and prevent unauthorized access. So, setting and changing your User-Agent is crucial to avoid blocking your IP when making automated website requests. Even though not every User-Agent belongs to a human, its absence in a request raises red flags and instantly screams bot.
For example, suppose your script retrieves data without using headless browsers and relies on simple requests. In that case, unless explicitly specified, you won’t send specific data to the site, including the User-Agent. On the other hand, real browsers continuously transmit the User-Agent when users visit a website.
Websites are wary of bots and actively block them to prevent malicious activities. Without a User-Agent, your IP address might be flagged and blocked, hindering your data collection efforts.
To avoid getting blocked, ensure your bot includes a User-Agent string in its requests. This simple step can make your bot appear more human-like and avoid website detection.
Mimicking different devices and browsers
User agent headers spoofing allows scrapers to mimic different devices and browsers, which can help access other versions of websites and content optimized for specific devices.
This is especially important when you want to access information that is only available to specific devices. For example, Google search results can vary significantly depending on the device type used to make the request.
User Agent Syntax
The User-Agent string is a specific format that contains information about the browser, operating system, and other parameters. In general, it looks like this:
User-Agent: <product> / <product-version> <comment>
Here, <product>
is the product identifier (its name or code name), <product-version>
is the product version number, and <comment>
is additional information, such as sub-product details.
For browsers, the syntax expands to:
Mozilla/[version] ([system and browser information]) [platform] ([platform details]) [extensions]
Let’s take a closer look at each parameter and its meaning.
Understanding the components of a user agent
The general syntax of a User-Agent string includes the following components:
-
Prefix and version: A prefix may be present at the beginning of the string, which usually indicates the type of device or application and its version. For example, “Mozilla/5.0” is often used in browser User-Agent strings.
-
Browser name: The browser information that makes the request follows the prefix. This may include the name and version of the browser. For example, “Chrome/121.0.6167.87”.
-
System Information: The operating system on which the request is made is specified after the browser information. This could be like “Windows NT 10.0; Win64; x64”.
-
Platform details: This may contain the layout engine used by the browser to render web pages and its version, such as WebKit/537.36.
-
Extensions: The User-Agent may contain other parameters, such as language information (e.g., “en-GB”) or screen resolution.
Let’s use this and compose a User-Agent string that specifies the Windows 10 operating system and Chrome browser version Version 121.0.6167.87.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.6167.87 Safari/537.36
User agents for other devices can be composed following a similar pattern.
Common formats and variations
User-agent strings often follow standard formats, like the one shown in the example above. However, some User-Agent strings may contain additional parameters, such as information about browser plugins or unique device identifiers.
To make our examples more complete, let’s consider different variations of User-Agents for different devices:
- Linux:
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0
- MacOS:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
- Mobile browsers:
Mozilla/5.0 (Linux; Android 10; HD1913) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.210 Mobile Safari/537.36 EdgA/120.0.2210.126
Now that we’ve covered User-Agents syntax let’s look at a list of up-to-date ones you can use in your projects.
List Of Latest User Agents For Web Scraping
Below we will provide tables with constantly updated lists of common User-Agents for popular platforms. Our scrapers automatically update list of User-Agents on a daily basis, so you can be sure you’re always using the latest.
Windows User Agents:
OS & Browser | User-Agent |
---|---|
Chrome 127.0.0, Windows 10/11 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 |
Edge 126.0.2592, Windows 10/11 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/126.0.2592.113 |
Edge 44.18363.8131, Windows 10/11 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; Xbox; Xbox One) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edge/44.18363.8131 |
Firefox 128.0, Windows 10/11 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 |
Firefox 128.0, Windows 10/11 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 |
Opera 113.0.0, Windows 10/11 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0 |
Opera 113.0.0, Windows 10/11 | Mozilla/5.0 (Windows NT 10.0; WOW64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0 |
MacOS User Agents:
OS & Browser | User-Agent |
---|---|
Chrome 127.0.0, Mac OS X 10.15.7 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 |
Edge 126.0.2592, Mac OS X 10.15.7 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/126.0.2592.113 |
Firefox 128.0, Mac OS X 14.5 | Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:128.0) Gecko/20100101 Firefox/128.0 |
Firefox 128.0, Mac OS X 14.5 | Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:128.0) Gecko/20100101 Firefox/128.0 |
Safari 17.5, Mac OS X 14.5 | Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15 |
Opera 113.0.0, Mac OS X 14.5 | Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0 |
Please note the browser version when choosing or composing a User-Agent. The best and most common user agents will use the latest version of Chrome, as it self-updates on startup. Therefore, most users will use it, and you can better mask your scraper by using custom User-Agents with the latest Chrome version.
How to Set User Agent
The configuration of User Agents depends on the context in which you want to use them. Typically, this involves your scripts that make requests to different websites. Let’s look at how to set User-Agents in two popular programming languages.
We will make requests to the website https://httpbin.org/headers, which returns all headers, including the user agent header:
- Python. We will use the Requests library to make the request:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}
response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.text)
Output:
{
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-65c0adfb-7a198b2f3bf4dff157696ce2"
}
}
- NodeJS. We will use fetch() to make the request:
fetch('https://httpbin.org/headers', {
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}
})
The response is similar to the previous one.
If you want to change your User-Agent for some reason, not in a script, but in your browser, you can set the User Agent in the “Network” or “Device” tab using the browser’s developer tools (DevTools). This can be useful for testing websites or web applications. In addition, there are special browser extensions that allow you to switch User-Agents easily.
How to Rotate User Agents
User-Agent rotation is an important part of a strategy to avoid IP address blocking. User-Agent rotation means constantly changing the User-Agent string that your software sends with each request. This can help you to reduce the time between requests without the risk of being blocked.
Importance of rotating user agents
As we mentioned earlier, User-Agent rotation is a crucial mechanism for bypassing protection measures and ensuring the continuity of web scraping operations and automated processes on the Internet. In short, using User-Agent rotation allows you to:
-
Increase the chances of avoiding IP address blocking.
-
More effectively mask requests.
-
Increase the reliability of the scraper.
-
Emulate requests from different devices and browsers.
In other words, User-Agent rotation allows you to mask requests, making them look more like regular requests made by human users, to access content optimized for specific platforms, or to test the compatibility of web pages on different devices. And if any User-Agent is temporarily blocked or stops working, you can switch to another one to continue scraping without downtime.
Techniques for rotating user agents in web scraping
Now that we have covered why User-Agent rotation is necessary let’s look at simple examples in Python and NodeJS that allow you to implement this functionality.
We will use the previous examples as a basis and add a variable containing a list of User-Agents and a loop that will call different User-Agents from the list. Then, we will make a request to the website, which will return the contents of the headers, display it on the screen, and move on to the next User-Agent.
The algorithm we’ve considered can be implemented in Python as follows:
import requests
# List of User Agents
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
]
# Index to track the current User Agent
user_agent_index = 0
# Make a request with a rotated User Agent
def make_request(url):
global user_agent_index
headers = {'User-Agent': user_agents[user_agent_index]}
response = requests.get(url, headers=headers)
user_agent_index = (user_agent_index + 1) % len(user_agents)
return response.text
# Example usage
url_to_scrape = 'https://httpbin.org/headers'
for _ in range(5):
html_content = make_request(url_to_scrape)
print(html_content)
For NodeJS, you can use the following code:
const axios = require('axios');
// List of User Agents
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:109.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
];
// Index to track the current User Agent
let userAgentIndex = 0;
// Function to make a request with a rotated User Agent
async function makeRequest(url) {
const headers = {'User-Agent': userAgents[userAgentIndex]};
const response = await axios.get(url, {headers});
userAgentIndex = (userAgentIndex + 1) % userAgents.length;
return response.data;
}
// Example usage
const urlToScrape = 'http://example.com';
for (let i = 0; i < 5; i++) {
makeRequest(urlToScrape)
.then(htmlContent => console.log(htmlContent))
.catch(error => console.error(error));
}
Both of these options successfully handle User-Agent rotation, and if you find them useful, you are free to use and modify them according to your needs.
Best Practices and Tips
To increase your success in data scraping, we recommend following some guidelines that can help reduce the risk of getting blocked. While not mandatory, these tips can enhance your script
Updating user agents regularly
Regular User-Agent rotation helps to prevent blocking. Websites have more difficulty detecting and blocking bots that constantly change their User-Agent.
Additionally, it’s essential to keep your User-Agent up to date. Using outdated User-Agents (e.g., Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36) can also lead to blocking.
Keep Random Intervals Between Requests
Besides keeping User-Agents up-to-date, don’t forget to implement random delays between requests. Real users don’t interact with websites without pauses or a fixed delay (e.g., 5 seconds) between requests. This behavior is only typical for bots and is easily detectable.
Random delays between requests help to simulate typical human user behavior, making it harder to detect automated processes. Additionally, delays can reduce the load on the server and make scraping less suspicious.
Rotate User Agents
As mentioned, rotating User-Agents reduces the risk of IP blocking since each request appears to come from a different user. This is especially useful if a website has restrictions on the frequency of requests from the same User-Agent. By rotating User-Agents, you can bypass these restrictions and continue accessing the website without issues.
Conclusion and Takeaways
This article has provided an overview of User-Agents in the context of web scraping. We reviewed the reasons for using User-Agents, explored the basics of the syntax, and offered a list of actual User-Agents and code examples for setting up User-Agents in two popular programming languages.
In addition, we described how to improve the effectiveness of User-Agents by rotating them and explained the importance of this practice. Finally, we concluded the article with practical tips to help you reduce the risk of scraping blocking and effectively mimic the behavior of real users.
Might Be Interesting
Aug 16, 2024
JavaScript vs Python for Web Scraping
Explore the differences between JavaScript and Python for web scraping, including popular tools, advantages, disadvantages, and key factors to consider when choosing the right language for your scraping projects.
- Tools and Libraries
- Python
- NodeJS
Aug 13, 2024
How to Scroll Page using Selenium in Python
Explore various techniques for scrolling pages using Selenium in Python. Learn about JavaScript Executor, Action Class, keyboard events, handling overflow elements, and tips for improving scrolling accuracy, managing pop-ups, and dealing with frames and nested elements.
- Tools and Libraries
- Python
- Tutorials and guides
Jul 1, 2024
How to Scrape Dynamic Content in Python
Explore techniques to scrape dynamic content in Python, including using tools like Beautiful Soup, Selenium, Pyppeteer, Playwright, and Scrapy. Learn advanced methods for handling infinite scroll and evaluating JavaScript.
- Python
- Tools and Libraries