Top List of User Agents for Web Scraping and Implementation Tips

Valentina Skakun Valentina Skakun
Last update: 24 Jun 2024

When scraping large amounts of information, the main problem is the risk of blocking and how to avoid it. We have already discussed that you can use captcha-solving services, proxies, or even a web scraping API that takes care of your difficulties.

However, suppose you are collecting data by making simple HTTP requests and want to create your scraper entirely. In that case, you cannot do without using headers in general and User-Agents in particular.

In this article, we will tell you what User Agents are, why they are needed, what they mean, and where to get them. In addition, we will provide code examples for both setting and rotating User Agents in Python and NodeJS.

What is User-Agent String

User-Agent is a string a web browser sends to a server when requesting a web page. It contains information about web browsers, operating systems, and devices.

Regularly changing the User-Agent and proxy is a crucial strategy to avoid blocking in web scraping. By changing the user agent header, you can emulate different devices and browsers, making detecting and blocking automated scraping requests harder for websites.

The Importance of User Agents in Web Scraping

User-Agents play a crucial role in web scraping, enhancing the scraping process, and avoiding detection and blocking. This section explores why you should use User-Agents in your scraping scripts.

Avoiding IP blocking

Not all websites are bot-friendly. Many websites have implemented anti-bot measures to protect their content and prevent unauthorized access. So, setting and changing your User-Agent is crucial to avoid blocking your IP when making automated website requests. Even though not every User-Agent belongs to a human, its absence in a request raises red flags and instantly screams bot.

For example, suppose your script retrieves data without using headless browsers and relies on simple requests. In that case, unless explicitly specified, you won’t send specific data to the site, including the User-Agent. On the other hand, real browsers continuously transmit the User-Agent when users visit a website.

Websites are wary of bots and actively block them to prevent malicious activities. Without a User-Agent, your IP address might be flagged and blocked, hindering your data collection efforts.

To avoid getting blocked, ensure your bot includes a User-Agent string in its requests. This simple step can make your bot appear more human-like and avoid website detection.

Mimicking different devices and browsers

User agent headers spoofing allows scrapers to mimic different devices and browsers, which can help access other versions of websites and content optimized for specific devices.

This is especially important when you want to access information that is only available to specific devices. For example, Google search results can vary significantly depending on the device type used to make the request.

User Agent Syntax

The User-Agent string is a specific format that contains information about the browser, operating system, and other parameters. In general, it looks like this:

User-Agent: <product> / <product-version> <comment>

Here, <product> is the product identifier (its name or code name), <product-version> is the product version number, and <comment> is additional information, such as sub-product details.

For browsers, the syntax expands to:

Mozilla/[version] ([system and browser information]) [platform] ([platform details]) [extensions]

Let’s take a closer look at each parameter and its meaning.

Understanding the components of a user agent

The general syntax of a User-Agent string includes the following components:

  1. Prefix and version: A prefix may be present at the beginning of the string, which usually indicates the type of device or application and its version. For example, “Mozilla/5.0” is often used in browser User-Agent strings.

  2. Browser name: The browser information that makes the request follows the prefix. This may include the name and version of the browser. For example, “Chrome/121.0.6167.87”.

  3. System Information: The operating system on which the request is made is specified after the browser information. This could be like “Windows NT 10.0; Win64; x64”.

  4. Platform details: This may contain the layout engine used by the browser to render web pages and its version, such as WebKit/537.36.

  5. Extensions: The User-Agent may contain other parameters, such as language information (e.g., “en-GB”) or screen resolution.

Let’s use this and compose a User-Agent string that specifies the Windows 10 operating system and Chrome browser version Version 121.0.6167.87.

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.6167.87 Safari/537.36

User agents for other devices can be composed following a similar pattern.

Common formats and variations

User-agent strings often follow standard formats, like the one shown in the example above. However, some User-Agent strings may contain additional parameters, such as information about browser plugins or unique device identifiers.

To make our examples more complete, let’s consider different variations of User-Agents for different devices:

  1. Linux:
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0
  1. MacOS:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
  1. Mobile browsers:
Mozilla/5.0 (Linux; Android 10; HD1913) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.210 Mobile Safari/537.36 EdgA/120.0.2210.126

Now that we’ve covered User-Agents syntax let’s look at a list of up-to-date ones you can use in your projects.

List Of Latest User Agents For Web Scraping

Below we will provide tables with constantly updated lists of common User-Agents for popular platforms. Our scrapers automatically update list of User-Agents on a daily basis, so you can be sure you’re always using the latest.

Windows User Agents:

OS & BrowserUser-Agent
Chrome 127.0.0, Windows 10/11Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Edge 126.0.2592, Windows 10/11Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/126.0.2592.113
Edge 44.18363.8131, Windows 10/11Mozilla/5.0 (Windows NT 10.0; Win64; x64; Xbox; Xbox One) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edge/44.18363.8131
Firefox 128.0, Windows 10/11Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0
Firefox 128.0, Windows 10/11Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0
Opera 113.0.0, Windows 10/11Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0
Opera 113.0.0, Windows 10/11Mozilla/5.0 (Windows NT 10.0; WOW64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0

MacOS User Agents:

OS & BrowserUser-Agent
Chrome 127.0.0, Mac OS X 10.15.7Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Edge 126.0.2592, Mac OS X 10.15.7Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/126.0.2592.113
Firefox 128.0, Mac OS X 14.5Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:128.0) Gecko/20100101 Firefox/128.0
Firefox 128.0, Mac OS X 14.5Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:128.0) Gecko/20100101 Firefox/128.0
Safari 17.5, Mac OS X 14.5Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15
Opera 113.0.0, Mac OS X 14.5Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0

Please note the browser version when choosing or composing a User-Agent. The best and most common user agents will use the latest version of Chrome, as it self-updates on startup. Therefore, most users will use it, and you can better mask your scraper by using custom User-Agents with the latest Chrome version.

How to Set User Agent

The configuration of User Agents depends on the context in which you want to use them. Typically, this involves your scripts that make requests to different websites. Let’s look at how to set User-Agents in two popular programming languages.

We will make requests to the website https://httpbin.org/headers, which returns all headers, including the user agent header:

  1. Python. We will use the Requests library to make the request:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}

response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.text)

Output:

{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-65c0adfb-7a198b2f3bf4dff157696ce2"
  }
}
  1. NodeJS. We will use fetch() to make the request:
fetch('https://httpbin.org/headers', {
    headers: {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
    }
})

The response is similar to the previous one.

If you want to change your User-Agent for some reason, not in a script, but in your browser, you can set the User Agent in the “Network” or “Device” tab using the browser’s developer tools (DevTools). This can be useful for testing websites or web applications. In addition, there are special browser extensions that allow you to switch User-Agents easily.

How to Rotate User Agents

User-Agent rotation is an important part of a strategy to avoid IP address blocking. User-Agent rotation means constantly changing the User-Agent string that your software sends with each request. This can help you to reduce the time between requests without the risk of being blocked.

Importance of rotating user agents

As we mentioned earlier, User-Agent rotation is a crucial mechanism for bypassing protection measures and ensuring the continuity of web scraping operations and automated processes on the Internet. In short, using User-Agent rotation allows you to:

  1. Increase the chances of avoiding IP address blocking.

  2. More effectively mask requests.

  3. Increase the reliability of the scraper.

  4. Emulate requests from different devices and browsers.

In other words, User-Agent rotation allows you to mask requests, making them look more like regular requests made by human users, to access content optimized for specific platforms, or to test the compatibility of web pages on different devices. And if any User-Agent is temporarily blocked or stops working, you can switch to another one to continue scraping without downtime.

Techniques for rotating user agents in web scraping

Now that we have covered why User-Agent rotation is necessary let’s look at simple examples in Python and NodeJS that allow you to implement this functionality.

We will use the previous examples as a basis and add a variable containing a list of User-Agents and a loop that will call different User-Agents from the list. Then, we will make a request to the website, which will return the contents of the headers, display it on the screen, and move on to the next User-Agent.

The algorithm we’ve considered can be implemented in Python as follows:

import requests

# List of User Agents
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
]

# Index to track the current User Agent
user_agent_index = 0

# Make a request with a rotated User Agent
def make_request(url):
    global user_agent_index
    headers = {'User-Agent': user_agents[user_agent_index]}
    response = requests.get(url, headers=headers)
    user_agent_index = (user_agent_index + 1) % len(user_agents)
    return response.text

# Example usage
url_to_scrape = 'https://httpbin.org/headers'

for _ in range(5):
    html_content = make_request(url_to_scrape)
    print(html_content)

For NodeJS, you can use the following code:

const axios = require('axios');

// List of User Agents
const userAgents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:109.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (X11; Linux i686; rv:109.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
];

// Index to track the current User Agent
let userAgentIndex = 0;

// Function to make a request with a rotated User Agent
async function makeRequest(url) {
    const headers = {'User-Agent': userAgents[userAgentIndex]};
    const response = await axios.get(url, {headers});
    userAgentIndex = (userAgentIndex + 1) % userAgents.length;
    return response.data;
}

// Example usage
const urlToScrape = 'http://example.com';
for (let i = 0; i < 5; i++) {
    makeRequest(urlToScrape)
        .then(htmlContent => console.log(htmlContent))
        .catch(error => console.error(error));
}

Both of these options successfully handle User-Agent rotation, and if you find them useful, you are free to use and modify them according to your needs.

Best Practices and Tips

To increase your success in data scraping, we recommend following some guidelines that can help reduce the risk of getting blocked. While not mandatory, these tips can enhance your script

Updating user agents regularly

Regular User-Agent rotation helps to prevent blocking. Websites have more difficulty detecting and blocking bots that constantly change their User-Agent.

Additionally, it’s essential to keep your User-Agent up to date. Using outdated User-Agents (e.g., Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36) can also lead to blocking.

Keep Random Intervals Between Requests

Besides keeping User-Agents up-to-date, don’t forget to implement random delays between requests. Real users don’t interact with websites without pauses or a fixed delay (e.g., 5 seconds) between requests. This behavior is only typical for bots and is easily detectable.

Random delays between requests help to simulate typical human user behavior, making it harder to detect automated processes. Additionally, delays can reduce the load on the server and make scraping less suspicious.

Rotate User Agents

As mentioned, rotating User-Agents reduces the risk of IP blocking since each request appears to come from a different user. This is especially useful if a website has restrictions on the frequency of requests from the same User-Agent. By rotating User-Agents, you can bypass these restrictions and continue accessing the website without issues.

Conclusion and Takeaways

This article has provided an overview of User-Agents in the context of web scraping. We reviewed the reasons for using User-Agents, explored the basics of the syntax, and offered a list of actual User-Agents and code examples for setting up User-Agents in two popular programming languages.

In addition, we described how to improve the effectiveness of User-Agents by rotating them and explained the importance of this practice. Finally, we concluded the article with practical tips to help you reduce the risk of scraping blocking and effectively mimic the behavior of real users.

Blog

Might Be Interesting