How To Retry Failed Python Requests
Errors during HTTP requests are something you’ll inevitably run into when scraping. In this article, I’ll share the main reasons requests fail, how to handle those failures, and how to implement retries when needed. Since Requests is the go-to library for HTTP in Python, most of the examples will focus on using it.
Common Reasons for Failed Requests
The try…except block is a useful tool when handling errors in Python. We’ll organize exceptions into their respective sections, but for now, let’s focus on the try block:
import requests
url = "example.com"
try:
response = requests.get(url)
response.raise_for_status()
If you’re looking to catch all request errors, no matter the cause, you can use this snippet:
except requests.exceptions.RequestException as e:
print(f"Request error: {e}")
Since we’ll be building on this script later, I won’t repeat this part every single time. It’s just here to set the foundation.
Timeout Errors
Let’s start with one of the most common issues: the timeout error. A timeout happens when the server doesn’t respond within the specified time limit. This could be due to server overload, network problems, or just a slow response time.
The catch is that the requests library doesn’t enforce a default limit, so if you don’t manually set a timeout, your request could theoretically hang forever. In reality, though, something will eventually interrupt your request – whether it’s the server, a router, or some other network component.
To avoid this, it’s a good idea to set a timeout explicitly, like this:
response = requests.get(url, timeout=5)
Now, what if you actually want to handle a timeout error? You can use a try-except block to catch it:
except requests.exceptions.Timeout:
print("Timeout error")
Instead of logging the error, you could take action, like retrying the request. That way, your script doesn’t give up at the first sign of trouble. I’ll go over setting up retries later in this article.
HTTP Status Error Codes
Working with APIs or scraping data invariably involves encountering unsuccessful HTTP requests. Whenever you send a request, you’ll get a status code back – this status code could be a successful one, a redirect, or an error. To keep things simple, I’ve summarized the key status codes in this handy table:
Code | Category | Name | Description |
---|---|---|---|
1xx | Informational | ||
100 | Informational | Continue | Request received; continue sending. |
101 | Informational | Switching Protocols | Server switching to another protocol specified in the Upgrade header. |
102 | Informational | Processing (WebDAV) | The server is processing the request, but no response is available yet. |
2xx | Successful | ||
200 | Successful | OK | The request was successful. |
201 | Successful | Created | The request was successful, and a new resource was created. |
202 | Successful | Accepted | The request has been accepted for processing but not completed. |
203 | Successful | Non-Authoritative Information | The response is from a third-party source. |
204 | Successful | No Content | The request was successful, but there is no content in the response. |
205 | Successful | Reset Content | The client should reset the form or view. |
206 | Successful | Partial Content | Only part of the resource is being returned (used with range requests). |
3xx | Redirection | ||
300 | Redirection | Multiple Choices | The request has multiple possible responses. |
301 | Redirection | Moved Permanently | The resource has been permanently moved to a new URL. |
302 | Redirection | Found | The resource is temporarily located at a different URL. |
303 | Redirection | See Other | Use another URL for the request. |
304 | Redirection | Not Modified | The resource has not changed (for caching purposes). |
307 | Redirection | Temporary Redirect | Temporary redirection to another URL. |
308 | Redirection | Permanent Redirect | Permanent redirection to another URL. |
4xx | Client Errors | ||
400 | Client Error | Bad Request | The server could not understand the request. |
401 | Client Error | Unauthorized | Authentication is required. |
403 | Client Error | Forbidden | The server refuses to authorize the request. |
404 | Client Error | Not Found | The resource was not found. |
405 | Client Error | Method Not Allowed | The HTTP method is not allowed for the resource. |
406 | Client Error | Not Acceptable | The resource cannot generate acceptable content. |
408 | Client Error | Request Timeout | The server timed out, waiting for the request. |
409 | Client Error | Conflict | The request conflicts with the current state of the resource. |
410 | Client Error | Gone | The resource is no longer available. |
415 | Client Error | Unsupported Media Type | The media type of the request is not supported. |
429 | Client Error | Too Many Requests | The client has sent too many requests in a given time. |
5xx | Server Errors | ||
500 | Server Error | Internal Server Error | An internal server error occurred. |
501 | Server Error | Not Implemented | The server does not support the functionality required for the request. |
502 | Server Error | Bad Gateway | Received an invalid response from an upstream server. |
503 | Server Error | Service Unavailable | The service is temporarily unavailable. |
504 | Server Error | Gateway Timeout | The gateway or proxy timed out, waiting for a response. |
505 | Server Error | HTTP Version Not Supported | The HTTP version used in the request is not supported by the server. |
While the table lists all possible status codes, I’ll focus on the ones that indicate errors. Specifically, the 4xx and 5xx codes, as well as timeouts. Thankfully, Python’s requests library allows us to handle these issues gracefully using exceptions. Here’s an example of how you can process errors in your code:
except requests.exceptions.HTTPError as http_err:
#Here will be code
A quick note: the requests.exceptions.HTTPError exception only gets raised for status codes 400 and above because we’re calling response.raise_for_status() in the example.
403: Forbidden
A 403 error means the server is blocking your request. This usually happens because of missing permissions, incorrect authentication, or absent API keys or tokens. If you’re confident that none of these apply, try adding some headers to your request. Servers often expect headers like User-Agent, Authorization, or Referer.
Here’s an example:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
}
response = requests.get(url, headers=headers)
Check out this table if you’re looking for up-to-date User-Agent strings. We routinely update it, so we’ve got you covered.
Want to handle this error programmatically? Add an exception block like this:
except requests.exceptions.HTTPError as http_err:
if http_err.response.status_code == 403:
print("Error. 403 Forbidden")
This error usually won’t happen if your headers and authentication are set up correctly.
429: Too Many Requests
Ok, let’s look at the 429 error. If you’ve built or are building a scraper, you’ve probably encountered this one. It means you’ve sent too many requests too quickly.
Here’s how to get around it:
- Add delays between your requests.
- Use proxies to distribute the requests.
I’ve written a separate article on using proxies with Python’s requests, so I won’t dive into that here. If you prefer the delay approach, you’ll need the time library:
import time
time.sleep(5)
To catch and handle this error, you can reuse the same code snippet as before:
except requests.exceptions.HTTPError as http_err:
if http_err.response.status_code == 429:
print("Error. 429 Too Many Requests.")
Pro tip: avoid sending too many requests from the same IP address, and you probably won’t have to deal with this error at all.
500: Internal Server Error
The 500 error is a server-side problem, meaning it’s not your fault. It usually indicates that the server failed to process your request due to some internal issue. Unfortunately, there’s not much you can do here except wait and try again later – or contact the website’s support team if it’s urgent.
To handle this error, use the following code:
except requests.exceptions.HTTPError as http_err:
if http_err.response.status_code == 500:
print("500 Internal Server Error.")
Keep in mind that this error is beyond your control to predict or correct.
502: Bad Gateway
A 502 error happens when there’s a communication issue between a server and a gateway or proxy. It’s usually a server problem, but if you’re using proxy, you might want to double-check them.
You can catch this error using:
except requests.exceptions.HTTPError as http_err:
if http_err.response.status_code == 502:
print("502 Bad Gateway.")
If you’re feeling proactive, verify your proxy settings and any intermediary servers you’re working with. But in most cases, the issue lies with the target server.
503: Service Unavailable
A 503 error means the server is either overloaded or temporarily down for maintenance. These errors are often short-lived, so the best solution is to wait and try again after a delay.
Here’s how you can handle it:
except requests.exceptions.HTTPError as http_err:
if http_err.response.status_code == 503:
print("503 Service Unavailable.")
For peace of mind, you might also try visiting the page in your browser to confirm its status.
504: Gateway Timeout
The 504 error indicates that a gateway couldn’t get a timely response from another server. Network issues or a slow backend server can cause this. Unfortunately, your only options here are to check your network or wait and retry the failed requests later.
Catch this error with:
except requests.exceptions.HTTPError as http_err:
if http_err.response.status_code == 504:
print("504 Gateway Timeout")
Instead of logging the error, I recommend building a requests retry mechanism. We’ll explore that in detail in the next section.
Configuring Retries in Python Requests
Using retries in Python requests can be super helpful, especially when dealing with unreliable network conditions. But let’s be real: too many retries can backfire. It can cause unnecessary delays or even get your client blocked by the server. That’s why it’s crucial not to configure retries simply but also to set a sensible limit on how many times your Python requests retry.
The ideal retry limit depends on the type of request and the server you’re targeting. Generally, it’s advisable to adhere to a three to five retry limit before ceasing altogether or introducing a longer pause between attempts. I’ll break down some key aspects of implementing retries effectively.
Setting Timeout Values
Let’s start with something we’ve touched on before. Before diving any deeper, it’s crucial to set a timeout for your requests. Otherwise, a request might hang indefinitely, leaving your program stuck. In the earlier section, I showed you how to set a basic timeout limit, but you can make things more efficient by splitting the timeout into connection time and read time.
Here’s an example:
response = requests.get(url, timeout=(3, 10))
In this case, I’ve set:
- 3 seconds for the connection timeout.
- 10 seconds for the read timeout.
This approach gives you better control over how your requests behave and is just good practice overall.
Handling Status Codes and Exceptions
Previously, I walked you through handling errors individually. This time, let’s look at a broader example that shows how to manage all status codes in one go.
Like in life, when writing scripts, it’s a good idea to anticipate errors and plan for them. For instance, you might want to specify fallback actions for unexpected errors like this:
except requests.exceptions.HTTPError as http_err:
if http_err.response.status_code == 403:
print("403 Forbidden.")
elif http_err.response.status_code == 429:
print("429 Too Many Requests.")
elif http_err.response.status_code == 500:
print("500 Internal Server Error.")
elif http_err.response.status_code == 502:
print("502 Bad Gateway.")
elif http_err.response.status_code == 503:
print("503 Service Unavailable.")
elif http_err.response.status_code == 504:
print("504 Gateway Timeout.")
else:
print(f"HTTP error: {http_err.response.status_code}")
except requests.exceptions.RequestException as e:
print(f"Request error: {e}")
That said, not all errors warrant retrying failed requests. Some can’t be “fixed” by simply reconnecting. This is especially true for 4xx errors, which typically result from something on the client side.
For example, if you’re getting a 403 (Forbidden), it’s worth double-checking your headers before retrying. For a 429 (Too Many Requests), you might need to wait or use a proxy to rotate your IP address.
Avoiding Infinite Loops
Retry loops are helpful but can be dangerous if not handled properly. Infinite requests retry loops can cause resource leaks, hammer servers, and even get your IP banned. That’s why it’s critical to define clear exit conditions for retries. For example, you might want to stop retrying after reaching a specific status code or exceeding a time limit.
Here’s how you can implement that:
MAX_RETRIES = 5
for attempt in range(1, MAX_RETRIES + 1):
try:
response = requests.get(url, timeout=TIMEOUT)
print("Request succeeded:", response.status_code)
break
except requests.exceptions.RequestException as e:
print(f"Connection attempt {attempt} failed: {e}")
if attempt == MAX_RETRIES:
print("Max retries exceeded. Exiting.")
time.sleep(2)
If you skip this step, retry loops can lead to more than just a temporary server block. They can overload your own network, leaving you unable to even Google your way out of trouble. So, be sure not to skip this step.
Using Backoff Strategies
Backoff strategies help prevent server overload, especially when dealing with frequent requests. A common choice is exponential backoff, where each retry is delayed by an increasing amount of time (e.g., 1 second, 2 seconds, 4 seconds, and so on).
You can implement this manually using time.sleep(), but honestly, it’s much easier to use a library like Tenacity.
Here’s an example of how to implement backoff using Tenacity:
from tenacity import retry, stop_after_attempt, wait_exponential
import requests
@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=2, max=10))
def make_request():
response = requests.get("https://example.com")
print(response.status_code)
return response
try:
make_request()
except Exception as e:
print("Request failed after retries:", e)
As you can see, this is far simpler than coding delays manually, and it’s more reliable, too.
Configure Retries Urllib and Tenacity
Alright, let’s dive into the simplest way to put everything we’ve discussed in this section together. To make this work, we’ll need the Tenacity and Urllib libraries. Here’s the complete code:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import requests
retry_strategy = Retry(
total=5,
status_forcelist=[403, 429, 500, 502, 503, 504],
backoff_factor=1
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("https://", adapter)
session.mount("http://", adapter)
response = session.get("https://example.com")
print(response.status_code)
But if you’re like me and prefer to break things down step by step, let’s go through each part together.
First, we need to import the required libraries:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import requests
Next, we’ll set up the retry logic:
retry_strategy = Retry(
total=5,
status_forcelist=[403, 429, 500, 502, 503, 504],
backoff_factor=1
)
Here, we specify the total number of retries (total = 5), define which status codes should trigger a retry, and set an exponential backoff factor. If you’re unfamiliar with backoff, it’s basically a delay that grows in a geometric progression. In our case, the factor is 1, so the delays will be 1, 2, 4, 8 seconds, and so on. Simple, right?
After that, we’ll create an adapter that applies our retry strategy:
adapter = HTTPAdapter(max_retries=retry_strategy)
Now, let’s set up a session object to manage the connection throughout all retries:
session = requests.Session()
session.mount("https://", adapter)
session.mount("http://", adapter)
Finally, we make the actual request:
response = session.get("https://example.com")
Why use a session, you ask? Well, without it, a new connection would be created for each request. This can slow things down and put extra load on the server. That said, if you’d rather skip sessions, you can reuse the code from earlier examples; it works just fine without them too.
Custom Retry Mechanisms
Now that we’ve covered the fundamentals and you should have a clear understanding of how retry mechanisms function let’s attempt to write our own retry logic. Trust me, it’s not as scary as it sounds. Here’s a final version of the script we’ll be building:
import time
import requests
def make_request_with_retry(url, retries=5, backoff_factor=2, timeout=5):
for attempt in range(1, retries + 1):
try:
response = requests.get(url, timeout=timeout)
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
if attempt == retries:
raise
wait_time = backoff_factor ** attempt
print(f"Error: {e}. Try {attempt}/{retries}. Retry after {wait_time} sec...")
time.sleep(wait_time)
url = "https://example.com"
try:
response = make_request_with_retry(url)
print(response.text)
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
First, we’ll import the requests library and time for adding delays:
import time
import requests
Next, let’s write a reconnect function that will execute the request, add the delay, and retry a set number of times if an error occurs:
def make_request_with_retry(url, retries=5, backoff_factor=2, timeout=5):
for attempt in range(1, retries + 1):
try:
response = requests.get(url, timeout=timeout)
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
if attempt == retries:
raise
wait_time = backoff_factor ** attempt
print(f"Error: {e}. Try {attempt}/{retries}. Retry after {wait_time} sec...")
time.sleep(wait_time)
Finally, set the URL and try making the request. If an error occurs, we’ll call the function we just wrote:
url = "https://example.com"
try:
response = make_request_with_retry(url)
print(response.text)
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
Even though this code will perform the same function as the previous one, it’s usually better to rely on ready-made solutions like urllib or Tenacity libraries since they tend to be more robust and handle edge cases better.
Conclusion
While I’ve already walked you through catching different types of errors during requests and setting up retries, I want to wrap things up with an important note: be really careful when designing your Python requests retry logic. If your scraper, or any other program, keeps firing off tons of requests without limits or proper conditions for retrying, it can lead to some pretty unpleasant consequences.
Endless retry attempts can easily create loops that hammer the network, bogging down performance. Worse, if you don’t configure retries thoughtfully, you might find yourself in a situation where your machine slows down significantly, the network becomes overloaded, and ultimately, you still receive no results. Frustrating.
That’s why it’s so crucial to think not just about how your program should handle retries when things go wrong, but also about when it should stop trying altogether. Setting clear limits on the number of retries, backoff timing, or the conditions under which retries happen can save you a lot of headaches. Trust me, a little upfront planning makes a significant difference!
Might Be Interesting
Jan 6, 2025
How to Scrape Google Maps Reviews
Learn how to scrape Google Maps reviews effectively using Python, APIs, or no-code tools. Explore methods to extract reviews for specific places, multiple locations, and search results, with step-by-step guidance and tips.
- Tutorials and guides
- Business
- Python
Oct 29, 2024
How to Scrape YouTube Data for Free: A Complete Guide
Learn effective methods for scraping YouTube data, including extracting video details, channel info, playlists, comments, and search results. Explore tools like YouTube Data API, yt-dlp, and Selenium for a step-by-step guide to accessing valuable YouTube insights.
- Python
- Tutorials and guides
- Tools and Libraries
Oct 16, 2024
Scrape Etsy.com Product, Shop and Search Results Data
Learn how to scrape Etsy product, shop, and search results data with methods like Requests, BeautifulSoup, Selenium, and web scraping APIs. Explore strategies for data extraction and storage from Etsy's platform.
- E-commerce
- Tutorials and guides
- Python