How to Read & Parse JSON with Python
JSON, or JavaScript Object Notation, is a lightweight data-interchange format. It is easy to read, write, parse, and even generate. JSON is often used in web applications, where it is used to transmit data between a web browser and a server.
In general, JSON is a text-based format for storing and exchanging data. It is widely used in web development to transmit data between web browsers and servers. JSON is also used to transmit data through APIs or to send data to users. Therefore, it is frequently encountered during web scraping.
Understanding JSON and Its Structure
JSON data is an ordered structure consisting of key-value pairs. Keys must be strings. Values can be any data type, including strings, numbers, arrays, and objects.
JSON is a lightweight data-interchange format that is easy for humans and machines to read and write. It is used in REST APIs, AJAX, and WebSockets. The simplest JSON structure is a single key-value pair, such as:
{
"title": "Example"
}
This approach allows JSON to easily structure data, which has helped to make it a popular format. Here are some of the benefits of using JSON:
-
JSON data is human-readable, which makes it easy to understand and debug.
-
JSON is a simple format for machines to parse and generate.
-
The JSON data structure is clear from the data itself, making it easy to understand.
JSON supports different data types, including strings, numbers, arrays, and objects. For example, the following JSON object has three properties:
{
"title": "Example",
"year": 2023,
"article": {
"title": "JSON parsing",
"language": "Python"
}
}
The article property is nested within the main object in the example above. This allows us to store more complex data in a JSON object.
Python JSON Parsing Basics
JSON parsing, also known as deserialization, is the process of converting a JSON string or file into a data structure that can be manipulated by a program. At the same time, JSON serialization is the process of converting a data structure into a JSON string or file. In other words, JSON parsing is the process of reading JSON data, while JSON serialization is writing JSON data.
While JSON and Python data types may share similarities, it’s essential to understand that there may be differences. Here is the table showing the correspondence between data types in JSON and Python:
JSON Data Type | Python Data Type |
---|---|
Number | int, float |
String | str |
Boolean | bool |
true | True |
false | False |
Object | dict |
Array | list |
Object (Empty) | dict() or {} |
Array (Empty) | list() or [] |
Null | None |
JSON Undefined | Not directly equivalent |
JSON NaN | float(‘nan’) |
Sets | set (conversion required) |
Bytes and Bytearrays | bytes, bytearray |
Enumerations | enum.Enum |
JSON and Python data types are generally compatible, but there may be some differences and considerations when working with them, especially regarding more complex data structures and custom objects.
You can use the built-in JSON module (JSON Encoder and Decoder) to read JSON data in Python. The JSON module provides two methods for converting JSON data to Python objects:
-
loads() converts a JSON string to a Python object.
-
load() converts a JSON file to a Python object.
The objects returned by these methods can be lists or dictionaries.
Parsing JSON Strings in Python
Let’s look at an example of using the JSON library to parse a string. To do this, we will create a new file in the .py format and import the library:
import json
Next, we will create a variable to store the JSON string as text:
json_string = '{"name": "John Doe", "age": 30}'
Then, we will use the JSON library to parse the string into a JSON object:
json_object = json.loads(json_string)
Now, we can work with these data as we would with a regular dictionary. Here is an example of how to access the data:
# Get the name
name = json_object["name"]
# Get the age
age = json_object["age"]
To change data in a dictionary, you can also access the data by the key.
Parsing JSON File
Now, let’s look at how to parse a JSON file. To do this, we will create a file called “data.json” and add the JSON code from the previous example. Then, we will update the previous example to read the data from the file:
with open("data.json", "r") as f:
json_object = json.load(f)
As a result, we got the same data into the json_object variable.
The Google SERP API library for Python is a comprehensive solution that allows developers to integrate Google Search Engine Results Page (SERP) data. It provides a simplified way to get organic search results, snippets, knowledge graph data, and other data from the Google search engine.
The Node.js Google Maps Scraping Library empowers developers to extract comprehensive location information from Google Maps effortlessly. This library streamlines the process of retrieving crucial data points such as a place's title, address, phone number, website URL, rating, reviews, and more.
JSON Serialization in Python
As mentioned earlier, serialization involves converting a JSON object to a string or file. To do this, we can use the following commands:
-
dump() can convert Python objects to a file.
-
dumps() can convert Python objects to a string.
Let’s take a look at each of these commands with an example.
Converting a Python Object to a JSON String
Let’s use the contents of the json_object variable from the previous example as a JSON object. To convert the data to a string, use the following command:
json_string = json.dumps(json_object)
The resulting string will be identical to the one we specified in the first example.
Writing a Python Object to a File
To create a file and save a JSON object, we will use the dump() command:
with open("data.json", "w") as f:
json.dump(json_object, f)
This will overwrite the contents of the file data.json with the new data from the json_object object.
Modifying JSON data
While working with JSON data in Python, it’s often necessary to modify the content of a JSON object. This section will explore various ways to add, update, and delete data within a JSON object.
Add Data to JSON Object
New data can only be added to a JSON object through key-value pairs. To do this, you need to access the object and set the new value:
json_object["city"] = "New York"
It will be created if the key does not exist in the object.
Update Data in JSON Object
Updating data is very similar to assigning new data. The only difference is that you use a key that already exists in the JSON object’s structure:
json_object["name"] = "Jane Doe"
This will replace the value of the name key from “John Doe” to “Jane Doe”.
Delete Data from JSON Object
Deleting data from a JSON object is not enough to replace it with an empty value. Instead, you need to remove the specific key-value pair. The JSON library supports the del keyword to delete a key-value pair:
del json_object["city"]
Be careful when using this function, as deleted data cannot be recovered.
Advanced JSON Parsing Techniques
In more complex scenarios, you may encounter advanced JSON parsing requirements. This section will showcase what goes beyond the basics, enabling you to handle complex data structures and unique situations effectively.
Python Pretty Printing for JSON
Although JSON data is easy to process, it can be challenging to read. For example, if you print the entire structure of a complex JSON object in a single line, it can be challenging to work with.
To address this, the concept of “Pretty JSON” was created. Pretty JSON is a way to format JSON in a more readable and aesthetically pleasing format. This is especially useful when working with large or nested JSON data.
Python offers the json.dumps() method with an indent parameter, allowing you to specify the level of indentation for nested structures. This feature significantly improves the readability of JSON output.
pretty_json = json.dumps(json_object, indent=4)
It will print a pretty JSON with an indentation of 4 spaces.
Working with JSON Data from External Sources
Applications often involve fetching JSON data from external sources such as web APIs or databases. The Requests library makes requests in Python and supports working with JSON responses.
Let’s request the httpbin.org website, which returns a JSON response with the IP address from which the request was made:
import requests
response = requests.get("https://httpbin.org/ip")
Now you can parse JSON data from the response:
json_data = response.json()
As a result, you will get a JSON object, and you can easily access its contents. For example, to get your IP address, you need to get the value of the key json_data[“origin”].
Handling Duplicate Keys
While JSON specification disallows duplicate keys within an object, real-world data doesn’t always conform to this rule. When dealing with JSON data that contains duplicate keys, it’s essential to know how to handle them.
Python’s JSON module, by default, does not handle duplicate keys gracefully and may raise exceptions. However, you can implement custom solutions to address this issue. For example, you can use the “strict” option to avoid errors:
json_data = json.loads(json_string, strict=False)
By setting the strict parameter to False when using json.loads(), it allows the JSON parser to continue parsing even if it encounters duplicate keys. The resulting Python dictionary will contain the last occurrence of each key-value pair in the JSON data.
Convert JSON to CSV in Python
CSV (Comma Separated Values) is one of the most popular file formats used for storing tabular data and is supported by most operating systems. Unlike XSLX, a CSV file stores tabular data as plain text separated by commas.
To quickly save a JSON object to an CSV file in Python, you can use the csv or pandas library. Let’s look at both examples. First, install the CSV and pandas libraries:
pip install csv
pip install pandas
Now let’s import the csv library into the script:
import csv
To write data to a file, we need to open the file and enter the data line by line:
with open('data.csv', 'w', newline='') as csvfile:
Then set the headers and create a CSV library object to write the data:
fieldnames = ['title', 'year', 'title', 'language']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
And take these data row by row:
writer.writerow({
'title': json_object['title'],
'year': json_object['year'],
'title': json_object['article']['title'],
'language': json_object['article']['language']
})
As a result, you will get the table:
If you want to simplify the recording process, you can use the pandas library dataframes. Import the library into the script:
import pandas as pd
Then make dataframe from JSON object:
df = pd.DataFrame(json_data)
And save the data to CSV:
df.to_csv("data.csv", index=False)
If you want to save data to XLSX file, you can use the to_excel method of the DataFrame. Specify the desired file name and set index=False to exclude the default index column:
df.to_excel("data.xlsx", index=False)
This code will create a ready-made Excel file from your JSON object, with the column names corresponding to the key names in your JSON data. It’s a convenient way to export structured data to Excel for further analysis or sharing.
Conclusion
This tutorial covered various aspects of working with JSON data in Python. JSON, or JavaScript Object Notation, is a ubiquitous format for exchanging and storing data. Understanding the structure, parsing, and serialization of JSON is essential for web developers, data scientists, and anyone who works with data in Python.
We covered the basics of JSON, including how to parse strings and JSON files. Additionally, we covered methods for modifying JSON data, including adding, updating, and deleting key-value pairs. We also covered advanced JSON parsing methods, such as pretty printing for improved readability, working with data from external sources, such as web APIs, and handling duplicate keys.
By mastering these skills, you will be well-prepared to work with JSON data in Python, whether building web applications, analyzing data, or working with data from various sources.
Might Be Interesting
Sep 9, 2024
How to Scrape Immobilienscout24.de Real Estate Data
Learn how to scrape real estate data from Immobilienscout24.de with step-by-step instructions, covering website analysis, choosing the right tools, and storing the collected data.
- Real Estate
- Use Cases
- Python
Aug 16, 2024
JavaScript vs Python for Web Scraping
Explore the differences between JavaScript and Python for web scraping, including popular tools, advantages, disadvantages, and key factors to consider when choosing the right language for your scraping projects.
- Tools and Libraries
- Python
- NodeJS
Aug 13, 2024
How to Scroll Page using Selenium in Python
Explore various techniques for scrolling pages using Selenium in Python. Learn about JavaScript Executor, Action Class, keyboard events, handling overflow elements, and tips for improving scrolling accuracy, managing pop-ups, and dealing with frames and nested elements.
- Tools and Libraries
- Python
- Tutorials and guides