How to Use Proxy with Python Requests
Proxies are intermediaries that can help you access the internet in various ways. They can bypass website blockages, circumvent IP-based restrictions, and improve your Python projects’ flexibility, security, and performance. By understanding how proxies work and how to use them effectively, you can unlock new possibilities for your projects.
In this article, you will learn the basics of using proxies with Python. By the end of this article, you will be able to use proxies to access blocked websites and content, bypass geo-blocks, and protect your privacy by hiding your IP address and encrypting your traffic.
The Google SERP API library for Python is a comprehensive solution that allows developers to integrate Google Search Engine Results Page (SERP) data. It provides a simplified way to get organic search results, snippets, knowledge graph data, and other data from the Google search engine.
The Google Maps API Python library offers developers an efficient means to harness detailed location data directly from Google Maps. This library simplifies the extraction of essential information such as the title of a place, its address, phone number, website URL, rating, reviews, and more.
Basics of Using Proxies with Requests
Let’s see how to make a simple request using different proxy types. This will help you understand how to use proxies with the Python Requests library. However, before doing that, let’s make a basic request without a proxy.
Create a new file with the extension *.py and import the Requests library:
import requests
Then, create a variable to store the website address you will be accessing. We will use a website that returns your IP address as a response for convenience. This will be useful later to make sure that the proxies are working.
url = 'https://httpbin.org/ip'
Now, request the given URL and print the result to the screen:
response = requests.get(url)
print(response.text)
You will receive an answer similar to this:
{
"origin": "151.115.44.26"
}
Now, let’s add a proxy to this basic request.
Setting Up HTTP/HTTPS Proxies
HTTP proxies are the most common and affordable type of proxy. However, they use an unencrypted connection, which makes them less secure. HTTPS proxies use the same connection method but encrypt the data, making them more reliable.
To use a proxy, we need to create a variable. If you want to use a proxy for HTTP request, your code will look like this:
proxies = {
'http': 'http://45.95.147.106:8080',
}
And for an HTTPS proxy:
proxies = {
'https': 'https://37.187.17.89:3128',
}
Or, you can specify both types of proxies at the same time:
proxies = {
'http': 'http://45.95.147.106:8080',
'https': 'https://37.187.17.89:3128'
}
To use a proxy with Python Requests, specify the proxies parameter and set its value to the corresponding variable. This will ensure that the request is executed using the proxy.
response = requests.get(url, proxies=proxies)
Using HTTP/HTTPS proxies with the Python Requests library is relatively straightforward. So, let’s take a look at how to set SOCKS proxies.
Using SOCKS Proxies
SOCKS proxies, especially SOCKS5, are more flexible and generic in their support for different types of traffic and authentication methods. They are often preferred for applications that require a broader range of proxies.
To use SOCKS proxies, you need to install the additional requests[socks] package:
pip install requests[socks]
Now, you can specify and use the SOCKS proxy IP addresses in a variable in your code.
proxies = {
'http': 'socks5://24.249.199.4:41458',
'https': 'socks5://24.249.199.4:41458'
}
Use SOCKS proxies when you need to get more functionality in your application.
Setting Proxies with Environment Variables
Environment variables are system-level variables that configure various software application settings and behaviors, including Python programs. When configuring HTTPS or HTTP proxy settings for Python programs that use the requests library, you can utilize environment variables to specify proxy information.
This approach lets you keep proxy configuration separate from your code, making it easier to manage proxy settings, especially in different environments or when sharing code.
You can set the environment variables for HTTP/HTTPS proxies manually or using the following commands:
export HTTP_PROXY=http://username:[email protected]:8080
export HTTPS_PROXY=https://username:[email protected]:8080
We have already written a detailed guide on environment variables, how to set them, and what they are used for. In case of any problems or questions, you can refer to our guide on scraping in PHP where we set them.
The main advantage of using environment variables is that you don’t need to specify the HTTP or HTTPS proxy in your code. They will be used automatically for all requests.
Authentication with Proxies
You must use a personal login and password to use protected and private proxies. However, the authentication methods differ for different types of proxies. Let’s take a look at them one by one.
HTTP/HTTPS Proxy Authentication
To authenticate to an HTTP/HTTPS proxy, you can simply specify the username and password as part of the proxy URL, for example:
http://{proxy_username}:{proxy_password}@{http_proxy_url}
You can then make requests as you did in the previous examples.
SOCKS Proxy Authentication
Authentication in SOCKS proxies is slightly different. Unlike the previous example, you need to authenticate during the request:
import requests
response = requests.get(target_url, proxies=proxies, auth=(proxy_username, proxy_password))
Alternatively, you can create a session and set the authentication parameters using it:
session.auth = ('username', 'password')
In other respects, the code is the same.
Common Issues: Handling Error 407
The 407 error, also known as “Proxy Authentication Required,” pops up when your request to a proxy server gets rejected because it’s missing authentication details. Simply put, the proxy wants you to provide the right credentials (username and password) before it lets you access the internet.
In my experience, there are typically four main culprits behind a 407 error:
Missing authentication in the request. You’re sending a request through the proxy but forgot to include your credentials.
Incorrect credentials. Maybe you mistyped your username or password (it happens to the best of us).
Proxy misconfiguration. This could mean using the wrong type of proxy (HTTP/HTTPS, SOCKS) or an incorrect address.
Proxy-side restrictions. Sometimes, the proxy has its own rules, like blocking requests that don’t align with its security policies.
Before you start blaming your code (or yourself), try running the same request with curl in your terminal. Here’s an example:
curl -x http://username:[email protected]:port http://httpbin.org/ip
If curl works and returns a response, the problem is likely with your Python code - maybe how it’s handling authentication or the request setup. But if curl also fails, you should double-check your proxy credentials and configuration.
Performing HTTP Requests Through Proxies
Before we move on to proxy server authorization methods and session usage, let’s look at the types of request methods that can be performed using the Requests library.
GET Method
This is the simplest and most commonly used type of request. It allows you to get any data located at the specified URL. In general, this request has the following form:
response = requests.get(target_url, proxies=proxies)
Use this method if you want to get the contents of a web page.
POST Method
The next method is POST. It allows you to send any data to the specified URL. However, this doesn’t mean you won’t receive any data in return. Typically, when you send data to a server using a POST request, you will receive a response from the server that may contain the needed data. Here is an example of a POST request:
response = requests.post(target_url, data=data, proxies=proxies)
This method is less commonly used but can be helpful when working with APIs.
Other Methods
The remaining methods are rarely used, so for convenience, we will summarize their descriptions and usage examples in a table.
Method | Description | Example |
---|---|---|
PUT | Update data on a server | requests.put(target_url, data=data, proxies=proxies) |
DELETE | Remove data from a server | requests.delete(target_url, proxies=proxies) |
HEAD | Get headers for a resource located at a URL | requests.head(target_url, proxies=proxies) |
OPTIONS | Get information about the communication options | requests.options(target_url, proxies=proxies) |
PATCH | Apply partial modifications to a resource | requests.patch(target_url, data=data, proxies=proxies) |
CONNECT | Establish a network connection to a resource, usually used with a proxy for tunneling purposes | requests.connect(target_url, proxies=proxies) |
TRACE | Retrieve a diagnostic trace of the communication between the client and server | requests.request(‘TRACE’, target_url, proxies=proxies) |
As you can see, any of the methods discussed can be used with a proxy if necessary.
Managing Sessions with Proxies
Sessions are a very convenient tool when developers want to set some settings once and use them for several connections. You can use the same established connection instead of creating new ones each time by using sessions with Python Requests.
A session retains settings, cookies, headers, and other information between multiple requests. This maintains state and authentication across HTTP requests. For example, if you log in to a website with one request or want to use the same proxy for all requests, the session will keep you logged in for subsequent requests.
To use a proxy with Python Requests for an entire session, you first need to create a session object and set the proxy IP addresses for it:
import requests
url = 'https://httpbin.org/ip'
session = requests.Session()
session.proxies = {
'http': 'http://45.95.147.106:8080',
'https': 'http://45.95.147.106:8080'
}
Now, when making a session request, you only need to specify the session and the URL. The proxies that we specified earlier will be used.
response = session.get(url)
After you are finished working with a session, you must close it:
session.close()
While using the Requests library, you can set multiple sessions and switch between them. This will allow you to configure your connections in the way that you need.
Advanced Techniques for Proxy Management
In addition to the topics we’ve covered, many other ways to use proxies with the Requests library exist. Let’s take a look at how to use environment variables to simplify your code and how to rotate proxies.
Rotating Proxies with Python
IP rotation and proxy pools are techniques used to rotate or change the IP addresses for making web requests in Python using the requests library. These techniques are valuable in web scraping, data collection, or other tasks where developers must avoid IP bans, rate limits, or access to geographically restricted content.
To use rotating proxies, you can use the previous examples. Just replace the specific proxy with a server URL:
import requests
proxies = {
'http': 'http://your-proxy-service-url.com',
'https': 'http://your-proxy-service-url.com'
}
Using proxy pools means managing a list of proxy servers (a proxies dictionary) and rotating through them. You can either create or get a list of HTTP/HTTPS proxies and use them one by one for your requests, switching between them as needed.
proxy_pool = ['http://45.95.147.105:8080', 'http://45.95.147.106:8080', 'http://45.95.147.107:8080']
for proxy_url in proxy_pool:
# YOUR CODE
Alternatively, you can choose a completely random proxy from the proxies dictionary:
import random
proxy_pool = ['http://45.95.147.105:8080', 'http://45.95.147.106:8080', 'http://45.95.147.107:8080']
num = random.randint(1, len(proxy_pool)-1)
proxies = {
"http://": proxy_pool[num]
}
Maintaining a proxy pool or proxy rotation manually is a complex task that requires careful management, error handling, and monitoring to ensure that IP rotation runs smoothly and potential issues are handled promptly. It is also essential to obtain proxy servers from reliable sources to avoid security and reliability problems.
Ignoring SSL Certificates in Rotating Proxies
When you’re rotating proxies to send HTTPs requests to numerous websites, you’ll inevitably bump into issues with invalid or self-signed SSL certificates. And in my experience, it can be a real headache when these errors block your connection. One way to deal with this (albeit not the most secure) is to disable SSL certificate verification entirely. This lets you keep the data collection flowing without interruptions.
Python’s requests library makes it super simple to bypass SSL verification using the verify=False
parameter. Here’s a quick example to show you how it works:
import requests
proxies = {
'http': 'http://username:password@proxy_ip:proxy_port',
'https': 'http://username:password@proxy_ip:proxy_port',
}
response = requests.get(
'https://example.com',
proxies=proxies,
verify=False # Disable SSL verification
)
However, there’s a catch, and it’s a big one. By disabling SSL certificate verification, you lose the ability to confirm that you’re actually connecting to the intended server. This opens up the risk of your data being intercepted or redirected to a malicious site.
Conclusion
In this article, we have explored the fundamentals of using proxies with Python’s Requests library. Whether you’re trying to scrape data, bypass geo-restrictions, protect your identity, or simply manage multiple IPs, proxies are a handy tool.
At their core, proxies act as intermediaries between your program and the internet, offering benefits like anonymity (can hide your real IP address), better access to restricted content, and improved performance for certain activities. They’re especially useful for automating tasks like web scraping or managing multiple accounts.
If you need something more advanced, commercial proxy services can save time by handling features like IP rotation, user authentication, and secure connections (SSL). These services can be a good investment for large-scale or professional projects.
Might Be Interesting
Oct 29, 2024
How to Scrape YouTube Data for Free: A Complete Guide
Learn effective methods for scraping YouTube data, including extracting video details, channel info, playlists, comments, and search results. Explore tools like YouTube Data API, yt-dlp, and Selenium for a step-by-step guide to accessing valuable YouTube insights.
- Python
- Tutorials and guides
- Tools and Libraries
Oct 16, 2024
Scrape Etsy.com Product, Shop and Search Results Data
Learn how to scrape Etsy product, shop, and search results data with methods like Requests, BeautifulSoup, Selenium, and web scraping APIs. Explore strategies for data extraction and storage from Etsy's platform.
- E-commerce
- Tutorials and guides
- Python
Sep 9, 2024
How to Scrape Immobilienscout24.de Real Estate Data
Learn how to scrape real estate data from Immobilienscout24.de with step-by-step instructions, covering website analysis, choosing the right tools, and storing the collected data.
- Real Estate
- Use Cases
- Python