The Complete Guide to Scraping Amazon Product Data using Python
Amazon is one of the largest and fastest-growing online marketplaces, attracting millions of monthly visitors worldwide according to Statista. For researchers, business analysts, and sellers, it’s a valuable source of data, but extracting that data isn’t as simple as it might seem.
In this article, I’ll walk you through three practical approaches: building Amazon scrapers with Python, using Amazon API, and scraping with ready-to-use tools for those who’d rather avoid coding.
Our pre-built Amazon Product Scraper is designed to pull all detailed product information, including reviews, prices, descriptions, images and brand from departments, categories, product pages, or Amazon searches. Download your data in JSON, CSV and Excel formats.
Amazon Reviews Scraper is the quickest, easiest way to gather customer reviews for any product on Amazon! With just a few clicks, you can quickly and easily gather customer reviews from Amazon and export them in a variety of formats, including JSON, CSV, and Excel.
Scraping Amazon Product Page
Amazon is one of the biggest marketplaces out there, and while manually gathering data from it could take forever, the good news is we can automate the whole process.
Full Amazon Product Scraper
If you’re in a rush, here’s the complete script right off the bat:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import pandas as pd
import time
chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)
url = "https://www.amazon.com/weBoost-472120-Signal-Booster-Carriers/dp/B081BM99M9"
driver.get(url)
time.sleep(5)
def safe_find_element(by, value, element=None):
try:
target = element if element else driver
return target.find_element(by, value)
except:
return None
def safe_find_elements(by, value, element=None):
try:
target = element if element else driver
return target.find_elements(by, value)
except:
return []
title = safe_find_element(By.ID, "productTitle").text.strip() if safe_find_element(By.ID, "productTitle") else "Title not found"
price = safe_find_element(By.CLASS_NAME, "priceToPay").text.strip() if safe_find_element(By.CLASS_NAME, "priceToPay") else "Price not found"
delivery_info = safe_find_element(By.ID, "amazonGlobal_feature_div").text.strip() if safe_find_element(By.ID, "amazonGlobal_feature_div") else "Delivery info not found"
about = safe_find_element(By.ID, "featurebullets_feature_div").text.strip() if safe_find_element(By.ID, "featurebullets_feature_div") else "About section not found"
review_link = safe_find_element(By.ID, "acrCustomerReviewLink")
if review_link:
rating = safe_find_element(By.CLASS_NAME, "a-size-small.cm-cr-review-stars-rating-spacing").text.strip() if safe_find_element(By.CLASS_NAME, "a-size-small.cm-cr-review-stars-rating-spacing") else "No rating"
review_count = safe_find_element(By.CLASS_NAME, "a-size-small.a-color-base.cm-cr-review-stars-text-sm").text.strip() if safe_find_element(By.CLASS_NAME, "a-size-small.a-color-base.cm-cr-review-stars-text-sm") else "No reviews"
else:
rating = "Rating not found"
review_count = "Review count not found"
features_table = None
try:
features_section = safe_find_element(By.ID, "productOverview_feature_div")
if features_section:
features = features_section.find_element(By.TAG_NAME, "table")
rows = features.find_elements(By.TAG_NAME, "tr")
feature_data = []
for row in rows:
cols = row.find_elements(By.TAG_NAME, "td")
if len(cols) == 2:
feature_data.append([cols[0].text.strip(), cols[1].text.strip()])
features_table = pd.DataFrame(feature_data, columns=["Feature", "Value"])
else:
features_table = pd.DataFrame(columns=["Feature", "Value"])
except Exception as e:
features_table = pd.DataFrame(columns=["Feature", "Value"])
print(f"Error extracting features table: {e}")
high_quality_images = set()
try:
li_elements = safe_find_elements(By.CSS_SELECTOR, "li")
for li in li_elements:
img = safe_find_element(By.TAG_NAME, "img", li)
if img:
img_url = img.get_attribute("data-a-hires") or img.get_attribute("src")
if img_url and img_url.startswith("https://m.media-amazon.com"):
if "AC_UF100" in img_url or "AC_UL100" in img_url:
continue
high_quality_images.add(img_url)
except Exception as e:
print(f"Error extracting images: {e}")
data = {
"Title": title,
"Price": price,
"Delivery Info": delivery_info,
"About": about,
"Rating": rating,
"Review Count": review_count
}
main_data_df = pd.DataFrame([data])
main_data_df.to_csv("main_data.csv", index=False)
images_df = pd.DataFrame(list(high_quality_images), columns=["Image URL"])
images_df.to_csv("images.csv", index=False)
features_table.to_csv("features_table.csv", index=False)
print("Extracted Data:")
for key, value in data.items():
print(f"{key}: {value}")
print("\nFeatures Table:")
print(features_table)
print("\nImages:")
print(f"{len(high_quality_images)} images found")
for img in high_quality_images:
print(f" - {img}")
driver.quit()
A while back, scraping Amazon was a bit easier. You could get away with using simpler libraries like Beautiful Soup. But as Amazon’s tracking methods have gotten smarter, dealing with CAPTCHAs has become a real headache.
So, what’s the solution? Well, I had to level up and switch to a more advanced library that mimics real user behavior. It’s not as straightforward as the old methods, but it’s way more effective for tackling the increased complexity of scraping Amazon.
Step 1: Prerequisites
Before we dive into building our Amazon scraper in Python, let’s take a closer look at how product pages are structured. Understanding that is key to scraping effectively. Each Amazon product page contains a lot of information, some of which is specific to a particular category.
Let’s look at a product page and identify the elements we want to scrape:
Here we can highlight the following CSS selectors for main elements:
Data Field | CSS Selector |
---|---|
Title | #productTitle |
Price | .priceToPay |
Delivery Info | #amazonGlobal_feature_div |
Rating and Reviews | #acrCustomerReviewLink |
Description (About) | #featurebullets_feature_div |
Features Table | #productOverview_feature_div table |
High-Quality Images | li img[data-a-hires^=“https://m.media-amazon.com”] |
These elements are common to all product categories and can be extracted using the scraper. If there are additional elements on the page you need, feel free to update the resulting code to add them.
Step 2: Retrieve Page HTML
Let’s dive into the first steps of scraping Amazon. To kick things off, we’ll grab the entire page content. Start by creating a new Python file (with a .py extension, of course) and import the libraries you’ll need:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import pandas as pd
import time
Now, you might wonder: why pandas and not just the simpler csv module? Well, pandas gives you way more flexibility when it comes to processing and saving data later on.
Next, we configure the WebDriver options and initialize the driver itself:
chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)
I’m using Chrome here because it’s popular and works well with Selenium, but feel free to swap it out for a different browser if you’ve got a favorite.
Once the driver is set up, let’s navigate to the Amazon page you want to scrape:
url = "https://www.amazon.com/weBoost-472120-Signal-Booster-Carriers/dp/B081BM99M9"
driver.get(url)
Now, here’s a small but important detail: web pages don’t load instantly. To make sure everything has time to render, let’s add a short delay:
time.sleep(5)
It’s not the best solution and you’d want to wait for specific elements to load, but we’ll keep it simple for now.
Step 3: Scrape Amazon Product Details
Before diving into the data scraping itself, let’s take a moment to set up two utility functions that will make our code cleaner and easier to manage. Here’s the thing: if we simply try to locate an element on a webpage and it doesn’t exist, the script could crash with an error. To avoid this, we could either wrap every element search in a try…except block, or we could write a single helper function to handle this for us. Spoiler alert: we’re going with the second option.
def safe_find_element(by, value, element=None):
try:
target = element if element else driver
return target.find_element(by, value)
except:
return None
def safe_find_elements(by, value, element=None):
try:
target = element if element else driver
return target.find_elements(by, value)
except:
return []
Now, you might think, “Why not just use perfect selectors from the start and skip all this code?” Good question! But the reality is, missing data isn’t always a selector problem. There are plenty of reasons why data might not show up as expected. Maybe there’s a hiccup in the page load, and some elements just don’t render. Or maybe the page itself loads incorrectly due to some server-side glitch.
From my experience, even with spot-on selectors, data issues can and will happen. That’s why having these fallback functions is a lifesaver - not just for keeping your script running smoothly, but also for saving you hours of debugging down the line.
Product Name
Now it’s time to put selectors we talked about earlier to work. Let’s extract the product title using a simple function. Here’s the code snippet:
title = safe_find_element(By.ID, "productTitle").text.strip() if safe_find_element(By.ID, "productTitle") else "Title not found"
The “Title not found” part is our another safety net. It’s always a good idea to include a default value like this - trust me, nothing derails your data pipeline faster than an unexpected NoneType error.
Product Price
We’ll extract product price data in the same way:
price = safe_find_element(By.CLASS_NAME, "priceToPay").text.strip() if safe_find_element(By.CLASS_NAME, "priceToPay") else "Price not found"
If you need to extract anotger type of price (for sale, with delivery or something else), please, change the selector.
Product Reviews and Rating
The product rating and reviews are actually in the same selector, which makes it pretty easy to extract both at the same time. So, the first step is to grab the parent element, and then we can pull the relevant values from it:
review_link = safe_find_element(By.ID, "acrCustomerReviewLink")
if review_link:
rating = safe_find_element(By.CLASS_NAME, "a-size-small.cm-cr-review-stars-rating-spacing").text.strip() if safe_find_element(By.CLASS_NAME, "a-size-small.cm-cr-review-stars-rating-spacing") else "No rating"
review_count = safe_find_element(By.CLASS_NAME, "a-size-small.a-color-base.cm-cr-review-stars-text-sm").text.strip() if safe_find_element(By.CLASS_NAME, "a-size-small.a-color-base.cm-cr-review-stars-text-sm") else "No reviews"
else:
rating = "Rating not found"
review_count = "Review count not found"
Keep in mind that the parent element might not be present for every product. So, it’s a good idea to account for that and avoid errors.
Product Description
The product description is an important and useful piece of information. Let’s extract it just like the other elements:
about = safe_find_element(By.ID, "featurebullets_feature_div").text.strip() if safe_find_element(By.ID, "featurebullets_feature_div") else "About section not found"
But honestly, product descriptions rarely hold critical information. Most of the time, the real value is in the product’s features. They tend to give you a better idea of what you’re dealing with, so don’t focus too much on the description.
Product Features
Different product categories may have additional descriptions and features, but they are transient, dynamic, and different from category to category. So, let’s extract full table with product features and make a dataframe using pandas:
features_table = None
try:
features_section = safe_find_element(By.ID, "productOverview_feature_div")
if features_section:
features = features_section.find_element(By.TAG_NAME, "table")
rows = features.find_elements(By.TAG_NAME, "tr")
feature_data = []
for row in rows:
cols = row.find_elements(By.TAG_NAME, "td")
if len(cols) == 2:
feature_data.append([cols[0].text.strip(), cols[1].text.strip()])
features_table = pd.DataFrame(feature_data, columns=["Feature", "Value"])
else:
features_table = pd.DataFrame(columns=["Feature", "Value"])
except Exception as e:
features_table = pd.DataFrame(columns=["Feature", "Value"])
print(f"Error extracting features table: {e}")
This approach will allow you to extract product features, no matter how many there are or which specific ones you’re looking for.
Delivery Infomation
The another important thing to scrape is delivery information:
delivery_info = safe_find_element(By.ID, "amazonGlobal_feature_div").text.strip() if safe_find_element(By.ID, "amazonGlobal_feature_div") else "Delivery info not found"
However, this data can vary depending on your location or the region of the proxies you’re using.
Product Images
Finally, let’s extract the image data:
high_quality_images = set()
try:
li_elements = safe_find_elements(By.CSS_SELECTOR, "li")
for li in li_elements:
img = safe_find_element(By.TAG_NAME, "img", li)
if img:
img_url = img.get_attribute("data-a-hires") or img.get_attribute("src")
if img_url and img_url.startswith("https://m.media-amazon.com"):
if "AC_UF100" in img_url or "AC_UL100" in img_url:
continue
high_quality_images.add(img_url)
except Exception as e:
print(f"Error extracting images: {e}")
Here, we’ve tried our best to identify the images as accurately as possible, so we’re only grabbing the higher-quality ones.
Step 4: Export to CSV
Now that we’ve extracted all the data we need, let’s store it in a variable for future saving:
data = {
"Title": title,
"Price": price,
"Delivery Info": delivery_info,
"About": about,
"Rating": rating,
"Review Count": review_count
}
As I mentioned earlier, using pandas will really help simplify the code and make it easy to save everything quickly. So, let’s go ahead and save the data into separate files. We’ll start with the main details:
main_data_df = pd.DataFrame([data])
main_data_df.to_csv("main_data.csv", index=False)
Next, we’ll save the images:
images_df = pd.DataFrame(list(high_quality_images), columns=["Image URL"])
images_df.to_csv("images.csv", index=False)
Then, save the product features:
features_table.to_csv("features_table.csv", index=False)
And finally, don’t forget to close the web driver when you’re done:
driver.quit()
This step is crucial to avoid overloading your PC.
Scraping Amazon Product Listings
In this section, we’ll go over how to build a scraper for Amazon’s search results or product listings using Python. Like in the previous example, we’ll rely on Selenium to get the page’s source code and use selectors to extract the data we need.
Full Amazon Product Listings Scraper
So, if you’re just looking for the ready-made code, feel free to copy it right away::
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import csv
chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)
url = 'https://www.amazon.com/s?k=pen'
driver.get(url)
time.sleep(10)
cards = driver.find_elements(By.CSS_SELECTOR, '.s-main-slot .s-result-item')
data = []
for card in cards:
title_elem = card.find_elements(By.CSS_SELECTOR, 'h2 .a-link-normal span')
title = title_elem[0].text if title_elem else 'Title not found'
price_whole_elem = card.find_elements(By.CSS_SELECTOR, '.a-price-whole')
price_fraction_elem = card.find_elements(By.CSS_SELECTOR, '.a-price-fraction')
if price_whole_elem and price_fraction_elem:
price = f"{price_whole_elem[0].text}.{price_fraction_elem[0].text}"
else:
price = 'Price not found'
rating_elem = card.find_elements(By.CSS_SELECTOR, 'div[data-cy="reviews-block"] .a-icon-alt')
rating = rating_elem[0].get_attribute('aria-label') if rating_elem else 'Rating not found'
reviews_elem = card.find_elements(By.CSS_SELECTOR, 'div[data-csa-c-content-id="alf-customer-ratings-count-component"]')
reviews = reviews_elem[0].text if reviews_elem else 'Reviews not found'
image_elem = card.find_elements(By.CSS_SELECTOR, '.s-image')
image_url = image_elem[0].get_attribute('src') if image_elem else 'Image not found'
if title == "Title not found":
continue
else:
data.append({
'title': title,
'price': price,
'rating': rating,
'reviews': reviews,
'image_url': image_url
})
keys = data[0].keys()
with open('amazon_data.csv', 'w', newline='', encoding='utf-8') as output_file:
dict_writer = csv.DictWriter(output_file, fieldnames=keys)
dict_writer.writeheader()
dict_writer.writerows(data)
driver.quit()
Now, if you’re interested in a bit more explanation, this time around, we’re skipping pandas and going with the good old csv library instead. Honestly, it’s more than enough for what we need.
Step 1: Analyze a Page
First things first, let’s take a quick look at what an Amazon search results page with relevant products looks like:
Next, we’ll dive into DevTools (just hit F12 or right-click on the page and select Inspect) to identify the key selectors we’ll need. These are the bits of HTML that correspond to the product titles, prices, images, and so on. Here’s a quick table of the selectors we’re focusing on:
Data Field | CSS Selector |
---|---|
Title | h2 .a-link-normal span |
Price | .a-price-whole и .a-price-fraction |
Rating | div[data-cy=“reviews-block”] .a-icon-alt |
Reviews | div[data-csa-c-content-id=“alf-customer-ratings-count-component”] |
Image URL | .s-image |
Now, let’s go to the script creation.
Step 2: Scrape Search Results with Python
The first part of the code will stay mostly the same, with a few changes. Instead of using a product page URL, we’ll use a search results URL. Also, we’ll need to import the csv library to save our data later.
Here’s the updated part:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import csv
chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)
url = 'https://www.amazon.com/s?k=pen'
driver.get(url)
time.sleep(10)
Now, we’ll gather all the product cards on the search results page and save the data into a variable.
cards = driver.find_elements(By.CSS_SELECTOR, '.s-main-slot .s-result-item')
data = []
Next, we’ll loop through each product card and extract the relevant details like title, price, rating, reviews, and image URL. Here’s how to do it:
for card in cards:
title_elem = card.find_elements(By.CSS_SELECTOR, 'h2 .a-link-normal span')
title = title_elem[0].text if title_elem else 'Title not found'
price_whole_elem = card.find_elements(By.CSS_SELECTOR, '.a-price-whole')
price_fraction_elem = card.find_elements(By.CSS_SELECTOR, '.a-price-fraction')
if price_whole_elem and price_fraction_elem:
price = f"{price_whole_elem[0].text}.{price_fraction_elem[0].text}"
else:
price = 'Price not found'
rating_elem = card.find_elements(By.CSS_SELECTOR, 'div[data-cy="reviews-block"] .a-icon-alt')
rating = rating_elem[0].get_attribute('aria-label') if rating_elem else 'Rating not found'
reviews_elem = card.find_elements(By.CSS_SELECTOR, 'div[data-csa-c-content-id="alf-customer-ratings-count-component"]')
reviews = reviews_elem[0].text if reviews_elem else 'Reviews not found'
image_elem = card.find_elements(By.CSS_SELECTOR, '.s-image')
image_url = image_elem[0].get_attribute('src') if image_elem else 'Image not found'
if title == "Title not found":
continue
else:
data.append({
'title': title,
'price': price,
'rating': rating,
'reviews': reviews,
'image_url': image_url
})
driver.quit()
At this point, we have a data list that holds all the information we need. You can either add pagination handling to scrape more pages, or, if you’re happy with the data on the current page, you can save it to a CSV.
Step 3: Handle Pagination
Instead of manually linking to separate search result pages, we can take advantage of pagination. By extracting the last page number, we can generate an array of links to visit, ensuring we get all the data for a given query.
def get_pagination_links():
page_links = []
pagination_items = driver.find_elements(By.CSS_SELECTOR, '.s-pagination-item')
for item in pagination_items:
if item.tag_name == 'a' and 'href' in item.get_attribute('outerHTML'):
page_link = item.get_attribute('href')
page_links.append('https://www.amazon.com' + page_link)
return page_links
page_links = get_pagination_links()
for url in page_links:
# Put your code for URL processing here
You can then wrap the code we’ve discussed earlier in a loop to go through each link in the array. Just don’t forget to change the file save mode from w (which overwrites the file) to a (which appends data to the file if it already exists).
Step 4: Export to CSV
We’ve already gone through how to save data using pandas, so this time let’s use the CSV library to save it instead:
keys = data[0].keys()
with open('amazon_data.csv', 'w', newline='', encoding='utf-8') as output_file:
dict_writer = csv.DictWriter(output_file, fieldnames=keys)
dict_writer.writeheader()
dict_writer.writerows(data)
As you can see, it’s pretty simple too. In the past, you could also scrape product reviews the same way. However, Amazon has tightened its policy, and now, in order to view more than 8 reviews for a product, you’ll need to sign in.
Scraping Data Using Amazon API
As already mentioned, we can use a web scraping API to solve the tasks at hand. It takes care of performing the requests for you. This means that you don’t have to worry about your IP address being blocked, and there is no need to use proxy servers to bypass CAPTCHA. In addition, our web scraping API allows you to quickly collect data using CSS selectors, using extraction rules.
Getting API key
To get started, you need to get an API key. You can find it in your account after signing up on HasData. In addition, you will receive 1,000 free credits when you register to test our features.
Save it, as you will need this API Key later.
Scrape Product Details with Amazon API
Now let’s get the same data as in the previous examples, but use the web scraping API. Also, we will save all the data we get to CSV file:
import requests
import csv
api_key = 'YOUR-API-KEY'
asin = 'B08Z5NYG12'
url = f"https://api.hasdata.com/scrape/amazon/product?asin={asin}"
headers = {
'Content-Type': 'application/json',
'x-api-key': api_key
}
response = requests.get(url, headers=headers)
product_data = response.json()
fields = [
"asin", "url", "title", "price", "usedPrice", "isAvailable",
"brand", "material", "product_dimensions", "care_instructions", "upc",
"totalReviews", "rating", "features", "mainImage"
]
rows = [
[
product_data['product'].get('asin'),
product_data['product'].get('url'),
product_data['product'].get('title'),
product_data['product'].get('price'),
product_data['product'].get('usedPrice'),
product_data['product'].get('isAvailable'),
product_data['product']['overview'].get('brand'),
product_data['product']['overview'].get('material'),
product_data['product']['overview'].get('product dimensions'),
product_data['product']['overview'].get('product care instructions'),
product_data['product']['overview'].get('upc'),
product_data['product']['reviews'].get('totalReviews'),
product_data['product']['reviews'].get('rating'),
"; ".join(product_data['product'].get('features', [])),
product_data['product'].get('mainImage')
]
]
csv_file = "product_data.csv"
with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(fields)
writer.writerows(rows)
print(f"Data saved to {csv_file}")
Now, let’s walk through the main stages of the script in detail. First, we start by importing the necessary libraries:
import requests
import csv
Next, we define the variables for the query parameters that we might want to tweak later. To keep things organized, I recommend placing them right at the top of the script:
api_key = 'YOUR-API-KEY'
asin = 'B08Z5NYG12'
url = f"https://api.hasdata.com/scrape/amazon/product?asin={asin}"
Once that’s done, we set up the actual request parameters and execute the request:
headers = {
'Content-Type': 'application/json',
'x-api-key': api_key
}
response = requests.get(url, headers=headers)
Finally, we parse the data we get back and save it to a file.
product_data = response.json()
fields = [
"asin", "url", "title", "price", "usedPrice", "isAvailable",
"brand", "material", "product_dimensions", "care_instructions", "upc",
"totalReviews", "rating", "features", "mainImage"
]
rows = [
[
product_data['product'].get('asin'),
product_data['product'].get('url'),
product_data['product'].get('title'),
product_data['product'].get('price'),
product_data['product'].get('usedPrice'),
product_data['product'].get('isAvailable'),
product_data['product']['overview'].get('brand'),
product_data['product']['overview'].get('material'),
product_data['product']['overview'].get('product dimensions'),
product_data['product']['overview'].get('product care instructions'),
product_data['product']['overview'].get('upc'),
product_data['product']['reviews'].get('totalReviews'),
product_data['product']['reviews'].get('rating'),
"; ".join(product_data['product'].get('features', [])),
product_data['product'].get('mainImage')
]
]
csv_file = "product_data.csv"
with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(fields)
writer.writerows(rows)
print(f"Data saved to {csv_file}")
And that’s it! You now have the data ready to use however you like.
Scrape Search Results with Amazon API
For those just looking for results, here’s the code ready to go:
import requests
import csv
query = "Laptop"
page = 1
api_key = "YOUR-API-KEY"
output_file = "amazon_search_results.csv"
api_url = f"https://api.hasdata.com/scrape/amazon/search?q={query}&page={page}"
headers = {
'Content-Type': 'application/json',
'x-api-key': api_key
}
payload = {}
response = requests.get(api_url, headers=headers, data=payload)
response_data = response.json()
if 'productResults' in response_data:
products = response_data['productResults']
with open(output_file, mode='w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['Position', 'Title', 'URL', 'ASIN', 'Image', 'Total Reviews', 'Rating', 'Price'])
for product in products:
writer.writerow([
product.get('position'),
product.get('title'),
product.get('url'),
product.get('asin'),
product.get('image'),
product.get('reviews', {}).get('totalReviews'),
product.get('reviews', {}).get('rating'),
product.get('price', {}).get('currentPrice')
])
else:
print("No product results found.")
As you can see, this script is quite similar to the previous one. So, let’s go over and change the script to scrape a different URL and work with different parameters and fields.
First, update the URL and any relevant parameters. For example, if the endpoint or query options change, you’ll need to reflect that here:
import requests
import csv
query = "Laptop"
page = 1
api_key = "YOUR-API-KEY"
output_file = "amazon_search_results.csv"
api_url = f"https://api.hasdata.com/scrape/amazon/search?q={query}&page={page}"
Make a request:
headers = {
'Content-Type': 'application/json',
'x-api-key': api_key
}
payload = {}
response = requests.get(api_url, headers=headers, data=payload)
response_data = response.json()
Next, adjust the parsing logic to match the new data structure. If you’re saving to a file, it’s worth taking a moment to verify that everything lines up correctly.
if 'productResults' in response_data:
products = response_data['productResults']
with open(output_file, mode='w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['Position', 'Title', 'URL', 'ASIN', 'Image', 'Total Reviews', 'Rating', 'Price'])
for product in products:
writer.writerow([
product.get('position'),
product.get('title'),
product.get('url'),
product.get('asin'),
product.get('image'),
product.get('reviews', {}).get('totalReviews'),
product.get('reviews', {}).get('rating'),
product.get('price', {}).get('currentPrice')
])
else:
print("No product results found.")
Once you’ve made these changes, your script should work seamlessly for the new task.
No-Code Amazon Scraping
Now let’s take a look at the easiest way to scrape Amazon. The nice thing about this approach is that it lets you get the data not only in JSON or CSV format but also in a way that’s ready to be imported into your Shopify store.
If you’re like me and really don’t want to mess around with proxies or setting up a captcha-solving service, this could be a solid option for you. And even if you’re totally not into coding, don’t worry – this method is completely code-free.
Scrape Amazon Product Pages
Let’s start with the Amazon Product no-code scraper. You can find it in your account on our website under the “no-code scrapers” tab. Here’s what it looks like:
To use it, simply provide the links to the product pages you want to scrape and hit the start button. Once the scraper finishes, you can download the results in your preferred format from the right side of the screen.
Example data:
As you can see, we got the same data as before, but even more of it, and the best part is we didn’t have to write any code. The results were ready almost instantly.
Scrape Amazon Search Results
Let’s now take a look at the next no-code scraper – the Amazon Search Results scraper. Let’s head over to the scraper’s page and see what needs to be filled out:
There are a few more fields here, but they’re all important. You’ll need to specify the number of search results you want to get for your query, as well as a list of keywords. You don’t necessarily have to specify the Amazon domain unless it’s something that matters to you.
As a result, you’ll get data in the following format:
The file actually had a lot more data, but it didn’t all fit into the screenshot.
Scrape Amazon Best Sellers
Finally, let’s talk about a no-code scraper that we haven’t covered an equivalent for in Python: the Best Sellers Amazon no-code scraper. Let’s head over to the scraper’s page:
To use it, all you need to do is input the category links, and you’ll get a list of top products in those categories:
As you can see, getting this data was also easy.
Conclusion
In this article, we have discussed various ways of collecting data from Amazon pages. We have also discussed the problems you will face if you use your scraper and methods of overcoming these problems.
You can use the web scraping API to avoid problems such as IP blocking and dealing with dynamic content. If you don’t want to face any difficulties, you can use our ready-made no-code Amazon scraper, which will give you Amazon product details in convenient format.
Might Be Interesting
Dec 6, 2024
XPath vs CSS Selectors: Pick Your Best Tool
Explore the key differences between CSS selectors and XPath, comparing their advantages, limitations, and use cases. Learn about performance, syntax, flexibility, and how to test and build selectors for web development.
- Basics
- Use Cases
Oct 29, 2024
How to Scrape YouTube Data for Free: A Complete Guide
Learn effective methods for scraping YouTube data, including extracting video details, channel info, playlists, comments, and search results. Explore tools like YouTube Data API, yt-dlp, and Selenium for a step-by-step guide to accessing valuable YouTube insights.
- Python
- Tutorials and guides
- Tools and Libraries
Oct 16, 2024
Scrape Etsy.com Product, Shop and Search Results Data
Learn how to scrape Etsy product, shop, and search results data with methods like Requests, BeautifulSoup, Selenium, and web scraping APIs. Explore strategies for data extraction and storage from Etsy's platform.
- E-commerce
- Tutorials and guides
- Python