How to Scrape Indeed Jobs Data in 2024

Valentina Skakun Valentina Skakun
Last update: 17 Jun 2024

Gathering job data from job boards is crucial for job seekers and HR companies. In a previous article, we discussed how to collect such data from the professional social network LinkedIn. Today, we’ll focus on Indeed, one of the largest job search platforms. 

Indeed allows users to post resumes, subscribe to job alerts, search for job openings, save them, and apply directly. This article will explore various methods for automating job data collection from this platform, enabling you to acquire the necessary information regardless of your programming expertise.  

Use Cases for Indeed Job Scraping

Before we dive into the practical aspects, let’s explore the specific scenarios where gathering data from Indeed can be particularly useful. In a nutshell, there are several key reasons to do so:

  • Analyze job market trends and identify in-demand skills. By analyzing job postings on Indeed, you can gain valuable insights into the current job market landscape, including the most sought-after skills and emerging trends.

  • Research salary ranges for specific job titles and locations. Indeed provides salary data for a wide range of job titles and locations.

  • Gather data for recruitment and talent acquisition purposes. The platform serves as a rich source of talent data, allowing you to identify qualified candidates, filter based on specific criteria, and reach out to potential hires.

  • Track competitor hiring activities and identify potential job candidates. Monitoring your competitors’ hiring activities on Indeed can provide valuable insights into their talent needs and strategies.

  • Conduct research on company culture, employee reviews, and workplace environment. Indeed provides access to employee reviews and company information, allowing you to gain insights into the company culture, work environment, and employee satisfaction levels.

Let’s delve into each of these areas in more detail. 

Labor Market Analysis

Regular labor market analysis helped consistently track changes in hiring and job openings. This assisted in understanding which specialists were in the highest demand at the current moment, identifying trends depending on various conditions, and tracking changes in job requirements and the skills sought after by employers.

This approach helped job seekers not only better understand what employers required of them but also assess the supply and demand for specific positions and professions.  

Salary Research

Another reason for collecting data from Indeed is to obtain salary information from job postings. This can help you compare salaries across different regions and industries and for different levels of experience and company sizes.

This can help you find potential benchmarks for salary negotiations during job interviews. Or, if you are an HR professional, it can help you understand the relevant salary for the criteria you need, which will be helpful when recruiting a new employee.

Recruitment and Talent Acquisition

As an HR professional, Indeed can be a valuable tool to streamline your recruitment process. You can efficiently identify and source potential candidates by leveraging its automation features, significantly reducing manual tasks.

Indeed allows you to create targeted candidate lists based on specific job requirements and skills. This lets you focus on the most qualified individuals, saving time and effort.

Competitive Intelligence

Monitoring job openings is beneficial for understanding your competitors’ requirements for their employees. Tracking your competitors’ recruitment activities helps you identify new vacancies and talent acquisition strategies.

To improve your potential candidate search process, you can try adopting the practices of your competitors. Analyze their job descriptions to understand the profile of their ideal candidates and consider how this could help improve your own.

Overview of Indeed

Now, look closely at Indeed and see what data we can get. If we’re talking about collecting job seeker data, we need to go to the resume search page, set a location and keyword, which can be a skill or a job title, and then go to the results page. 

Indeed resume search page

Indeed resume search page

As you can see, if you’re not authorized, you won’t be able to view detailed job information. The limited information you receive may not be sufficient. Therefore, we won’t stop scraping candidates. 

Let’s proceed to the job search page and find all available developer positions.

Indeed Jobs Listing

Indeed Jobs Listing

The presented job list offers a comprehensive overview of available positions. On the left, you’ll find a summary of all openings, while the right side provides in-depth current job details.

Suppose you need to gather data on specific positions. In that case, you currently face two options: manually navigate to each job, click on it, and collect detailed information individually, or limit yourself to the brief information in the general list of vacancies.

Choosing a Scraping Method

The method you choose for collecting data from Indeed will depend on your specific needs, programming skills, data volume requirements, and desired data acquisition speed. Here’s a breakdown of the three main options:

  1. Manual Data Collection. This method involves manually copying and pasting data from Indeed job listings into a spreadsheet or other data storage format. It’s suitable for small, one-time data collection tasks and requires no programming expertise. However, it’s time-consuming, tedious, prone to errors, and not scalable for large datasets.

  2. Using a Ready-Made Indeed No-Code Scraper. Numerous web scraping services offer pre-built Indeed scrapers that allow you to extract data without coding. These scrapers provide a user-friendly interface and deliver structured data files. However, they lack flexibility and customization options, and you’re limited to the data the scraper can extract.

  3. Creating Your Own Indeed Scraper. Building your own scraper using programming languages like Python or Node.js offers complete control and customization. You can tailor the scraper to your specific data needs and modify it as Indeed’s website changes. However, it requires programming skills and ongoing maintenance.

  4. Utilizing Indeed API. While Indeed’s official API doesn’t directly provide data scraping functionality, you can leverage third-party Indeed scraping APIs that connect to Indeed and extract data based on your criteria. These APIs deliver structured data in JSON format, eliminating the need for coding or maintaining a scraper.

The best method depends on your specific circumstances. Manual data collection or a ready-made scraper are suitable options for small, one-time data collection. At the same time, for frequent data collection or specific data needs, creating your own scraper or using a third-party API offers more flexibility and control. For ongoing data collection as a developer, using a third-party API or building a scraper with maintenance in mind are efficient choices.

Method 1: Scrape Indeed Data Without Code

Let’s start with the simplest method and explore how to quickly gather job data using HasData’s Indeed no-code scraper. To utilize this tool, sign up on our website and navigate to your dashboard’s “No-Code Scrapers” tab. Locate the “Indeed Scraper” and click on it. 

No-code Scrapers page

Find Indeed no-code scraper

Let’s look at this page closely:

Indeed Scraper Page

Set your parameters at Indeed scraper page

To make the process easier, we’ve numbered the different fields:

  1. Results Limit. Enter 0 here to get the maximum number of results for each query.

  2. Job title, keywords, company. Enter the keywords you want to use to search for jobs. You can enter multiple keywords, each on a new line.

  3. City, state, zip code. Enter the city where you want to find jobs.

  4. Indeed Domain. Select the Indeed domain corresponding to the country you are searching for jobs.

  5. Run Scraper. Click this button to start the scraping process.

  6. Scraper Tasks. This section shows the scraping tasks. Once the tasks are complete, you can download the results in a convenient format.

Let’s download the resulting file and see what fields it includes:

Research the results

Research the results

Since it’s too large, the image only shows a portion. As a result, the no-code scraper gathers all available information for each job posting, including its full description.

Using the Indeed no-code scraper is suitable for those who need to get data quickly and if you don’t want to write your own tool. Moreover, since it collects all available information for each job posting, you can get the most comprehensive dataset.

Method 2: Scrape Indeed Property Data with Python

Now, let’s create a simple tool using the Requests library and end with an example using a headless browser. We’ve chosen Python as the programming language because it’s easy enough for beginners.

We’ll also discuss how integrating an API can significantly simplify scraper creation. In the meantime, let’s focus on preparing for data scraping. 

Setting Up Your Development Environment

To get started, you will need to install an interpreter and an application that allows you to edit code or a full-fledged development environment. We have previously written about how to do this for Python.

You will need Python version 3.10 or higher to run the code we will be using. Let’s install the libraries that we will be using with the package manager:

pip install requests bs4 selenium time

We’ve listed all the required libraries here, including requests, which comes pre-installed if you choose to use a virtual environment. In this case, you’ll need to install all the other libraries.

To run Selenium correctly, you’ll also need a separate web driver, depending on your browser. We’ve covered how to use Selenium in a separate article and provided links to all the web drivers (not required for the latest Selenium versions).

Get Data with Requests and BS4

Let’s start with a basic approach: using the requests library to fetch the HTML code of the job listings page and then parsing it with BeautifulSoup. We’ll store the extracted data in a variable for easy saving to a file.

You can directly view and run the ready-made script in Google Colaboratory. However, Indeed will not return data when making requests, so collecting data this way won’t be possible.

If you still want to try and see for yourself, create a new file with a .py extension and import the libraries:

import requests
from bs4 import BeautifulSoup

Next, define variables to hold dynamic parameters such as job title or keyword and city:

base_url = "https://www.indeed.com/jobs"
query = "developer"
location = "New York, NY"

Next, set headers to make them more human-like:

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.5",
    "Referer": "https://www.google.com/",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive"
}

You can get the latest User Agent from our regularly updated table. 

Next, generate a link based on the specified parameters.

url = f"{base_url}?q={query}&l={location.replace(' ', '+').replace(',', '%2C')}"

Make a request and check it’s status:

response = requests.get(url, headers=headers)
response.raise_for_status()

As a result you will get an error:

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.indeed.com/jobs?q=developer&l=New+York%2C+NY

Unfortunately, the issue can only be resolved using a headless browser instead of the requests library or employing a ready-made Indeed scraping API.

Handling Dynamic Content with Selenium

Let’s address the issue from the previous example and rewrite the code using the Selenium library. The completed script is also available on Google Colaboratory. However, since it doesn’t support running headless browsers, you’ll need to download and run it on your local machine for testing.

Create a new file with the extension *.py and import the required modules:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

We also imported the time library to introduce a delay after starting the browser. Next, set the parameters and compose the link:

base_url = "https://www.indeed.com/jobs"
query = "developer"
location = "New York, NY"

url = f"{base_url}?q={query}&l={location.replace(' ', '+').replace(',', '%2C')}"

To interact with web pages, you first need to create an instance of a web driver and configure its settings. Once the web driver is initialized, you can launch it and navigate to the desired URL:

chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
time.sleep(5)

Get the page source:

page_source = driver.page_source

Now, create a variable to store the data. Then,  locate the block containing the job postings and extract the relevant information. Finally, display the extracted data and store it in the variable:

output_data = []
job_listings = driver.find_elements(By.CLASS_NAME, 'result')

for job in job_listings:
    try:
        company_location = job.find_element(By.CLASS_NAME, 'company_location')
        company_name = company_location.find_element(By.CSS_SELECTOR, 'span[data-testid="company-name"]').text
        job_location = company_location.find_element(By.CSS_SELECTOR, 'div[data-testid="text-location"]').text
        print(f"Company Name: {company_name}")
        print(f"Job Location: {job_location}")
        job_metadata = job.find_element(By.CLASS_NAME, 'jobMetaDataGroup')

        posted_date = job_metadata.find_element(By.CSS_SELECTOR, 'span[data-testid="myJobsStateDate"]').text.splitlines()[1].strip()
        print(f"Posted Date: {posted_date}")

        job_data = {
            "Company Name": company_name,
            "Job Location": job_location,
            "Posted Date": posted_date
        }

        output_data.append(job_data)
    except Exception as e:
        print(f"Error extracting job information: {e}")

    print("\n" + "-"*100 + "\n")

Close the webdriver:

driver.quit()

As a result, you will get the data like in this example:

Test the script

Test the script

This script can be enhanced to efficiently gather detailed information from each job listing. You can achieve this by iterating all available listings and extracting relevant data using appropriate selectors. Additionally, you can customize the script to focus on specific information that aligns with your needs.

Method 3: Scrape Indeed using API

Now, let’s explore how to integrate Indeed APIs into your scripts to eliminate the need for headless browsers and job page parsing to retrieve data. We’ll delve into two Indeed APIs:

  1. Indeed Listing API. Retrieves data from job search result pages. It provides less comprehensive data but covers all listed jobs.

  2. Indeed Properties API. Fetches detailed information about a specific job posting.

You can find and try them (set parameters, get code, or even make a request) in our APIs Playground:

Research API Playground

Find Indeed APIs

These APIs cater to different data retrieval needs. If Indeed Listings API doesn’t provide the level of detail you require, you can leverage Indeed Properties API to gather in-depth data for specific job listings. 

Scrape Listings using Indeed API

First, let’s retrieve a list of job openings and their corresponding brief details. You can review and run a prepared script in Google Colaboratory. To use it, remember to specify your HasData API key and the job search query parameters.

Create a new file and import the necessary libraries:

import requests
import json

Define variables and endpoint for HasData’s Inleed Listing API:

base_url = "https://api.hasdata.com/scrape/indeed/listing"
keyword = "software engineer"
location = "New York, NY"
domain = "www.indeed.com"

Compose the link:

url = f"{base_url}?keyword={keyword.replace(' ', '+')}&location={location.replace(' ', '+').replace(',', '%2C')}&domain={domain}"

Specify the request headers, including your API key:

headers = {
    'Content-Type': 'application/json',
    'x-api-key': 'PUT-YOUR-API-KEY'
}

Make the request and display the result on the screen:

response = requests.get(url, headers=headers)
if response.status_code == 200:
    data = response.json()
    print(json.dumps(data, indent=2))

else:
    print(f"Error: {response.status_code} - {response.text}")

As a result, we will get data of the following kind:

An example of JSON response

An example of JSON response

This data should be sufficient to provide an overview of different job openings. However, if you require more detailed information about a job opening, refer to the following endpoint.

Scrape Properties using Indeed API

This endpoint is ideal if you already have a list of job openings for which you want to gather more detailed information. You can also find this script in Google Colaboratory

In this example, we’ll also use the urllib library to process the job listing URL. If you don’t have it installed, use the following command:

pip install urllib

To begin, import the necessary libraries into your script:

import requests
import json
import urllib.parse

Next, we’ll provide the URLs for the API endpoint and the desired job listing and construct a general URL for getting data:

base_url = "https://api.hasdata.com/scrape/indeed/job"
job_url = "https://www.indeed.com/viewjob?jk=cf56b58ab740db2a"
encoded_job_url = urllib.parse.quote(job_url, safe='')
url = f"{base_url}?url={encoded_job_url}"

Set request headers, including your HasData’s API key:

headers = {
    'Content-Type': 'application/json',
    'x-api-key': 'YOUR-API-KEY'
}

Execute the request and display the results:

response = requests.get(url, headers=headers)

if response.status_code == 200:
    data = response.json()
    print(json.dumps(data, indent=2))
else:
    print(f"Error: {response.status_code} - {response.text}")

Get the full data, including job descriptions (in this example they are shorted):

Indeed Job result

Indeed Job response

You can access all available information about the specific vacancy by providing the full job description text.

Data Processing and Storage

You’ll need the appropriate libraries to save data to files in Python. Let’s install the JSON and CSV libraries using a package manager:

pip install json csv

Next, import the libraries into your Python script:

import json
import csv

Now, add the logic for saving data to files. Let’s use a previously created variable as the data source. To save as CSV file:

csv_file = 'jobs_data.csv'

with open(csv_file, 'w', newline='', encoding='utf-8') as csvfile:
    fieldnames = ['Company Name', 'Job Location', 'Posted Date']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for job in output_data:
        writer.writerow(job)

To save as JSON file:

json_file = 'jobs_data.json'
with open(json_file, 'w', encoding='utf-8') as jsonfile:
    json.dump(output_data, jsonfile, ensure_ascii=False, indent=4)

This will save the previously obtained data to files, allowing you to access them in a format suitable for further processing.

Conclusion

Scraping job data from Indeed offers valuable insights into the job market, enhancing recruitment strategies and maintaining a competitive edge in hiring practices. This article delved into the primary methods for gathering data from this platform. We provided a detailed breakdown of each approach, from utilizing no-code scrapers to building custom scrapers from scratch.

Additionally, we’ve made all the scripts discussed in this article available on Google Colaboratory. If you’re more interested in the results than the theory, you can access the corresponding pages directly and view the ready-made scripts. Links to these scripts are provided at the beginning of each relevant subsection.

Overall, we hope this article clarifies the benefits of scraping Indeed and provides you with valuable takeaways that you can implement in your own endeavors.

Blog

Might Be Interesting