How to parse JSON files with Python JSONPath

JSON is currently one of the most important formats for exchanging data between applications, especially online. JSONPath is an expression language that can be used to read specific data from JSON objects. Here, we’ll look at the Python implementation of JSONPath and where to use it using easy-to-understand examples.

What is Python JSONPath?

JSON is a cross-system file format that can be used to facilitate the exchange of structured data between applications. JSON files consist of listed key-value pairs. The values can accept different types of data, both primitive values and objects. Objects, in turn, can contain their own key-value pairs. Since JSON is understood by almost all modern systems, it can be used for data exchange between any type of application, both locally on the machine and over the internet.

But not every application needs all its data captured in a JSON file. In these cases, JSONPath is a good choice. JSONPath is an expression language that can be used to read specific information from JSON objects. In most programming languages, JSONPath needs to be imported from an external library. Because these libraries need to be implemented separately for each language, the libraries or implementations may differ slightly.

The Python jsonpath-ng module

jsonpath-ng is probably the most common Python implementation of JSONPath. There are also other JSONPath implementations for Python, such as jsonpath and jsonpath-rw. But these are less popular and comprehensive, so we’ll focus exclusively on jsonpath-ng here.

Installation

You can install jsonpath-ng very easily from your shell. Simply enter the command pip install jsonpath-ng to start.

Note

The installation is done via the package manager pip, which is used by default for Python. In case you don’t have this package manager installed, you’ll need to download it first. More information can be found on the Pip website.

Syntax

JSONPath can be used to execute complex queries on JSON objects. There are several methods, operators, and atomic expressions in the module that can be used to select and query specific data. The two most important JSONPath methods are parse() and find(). With parse() you can define queries, which can then be referenced and repeated as often as you like. With find() you can run these queries on JSON data to extract concrete values. The following example explains it further.

import json
import jsonpath_ng as jp
raw_data = '''
{
    "name": "John",
    "age": 30,
    "place of residence": "New York"
}
'''
json_object = json.loads(raw_data)
name_query = jp.parse("$.name")
result = name_query.find(json_object)
print(result[0].value) # output: John
Python

In the example above, JSON data in the form of a string was converted into a dictionary object using json.loads. This is the format that Python works best with. When creating name_query, the request $.name was defined, which should return the value of name. This was then applied to the JSON object with find(). The result of the request was stored in the variable result and read out with result[0].value.

Note

For Python to be able to read JSON data from a string or from a JSON file, the Python module json should be included, like in the example above. Strings and files can then be converted into a Python-readable format using loads() or load().

The find method doesn’t just return the requested value, but also other contextual information, like the path to the searched value. This information is returned in the form of a list, where the value you are looking for has an index of 0. Now you can use result[0].value to output the value you are looking for.

In the above example, the dollar sign was used when setting the request. This is an atomic expression that is used to refer to the root object of the JSON. All operators and atomic expressions are listed in the following table.

Expression/Operator Meaning Example Explanation
$ Root-object $.marcus.age Searches the value of the ‘age’ key from the ‘marcus’ object.
. Field of an object $.marcus Searches ‘marcus’, where ‘marcus’ is a field of the root object.
.. Recursive search for a field. Fields in sub-objects are also searched. $.people..age Returns all occurrences of the ‘age’ field in people and its subobjects.
[x] Element in an array $.people[5] Searches the sixth element (at index 5) in the ‘people’ array.
\* Number placeholder, mostly used in connection with for loops ‘$.people[*]’ Searches a field in ‘people’. Combined with a for loop, each field is returned in turn.

In addition to expressions and operators, there are filters that you can use to make your search even more specific. In the Python implementation of JSONPath, these can be executed with Python operators. All symbols that can be used with filters are shown with examples in the following table.

Symbols Meaning Example Explanation
.[?(filter)] General syntax for filters. Round brackets can be left out. $.people[?(@.name == "Anne")] Searches for people whose name is ‘Anne’.
@ Object currently being searched, often used in connection with for loops. $.people[?(@.age < 50)] Searches fields in ‘people’ whose value for ‘age’ is less than 50.
<, >, <=, >=, == und != Comparison operators that can be used to filter out specific search results. $.people[@.age < 50 & @.age > 20] Searches for people who are between 20 and 50 years old.
& Logical AND. $.people[?(@.place of residence == Newark & @.age > 40)] Searches for people who are older than 40 and live in Newark.
Note

If you want to use filters, you need to include the module jsonpath_ng.ext and reference it when calling parse().

Use case for Python JSONPath

import json
import jsonpath_ng as jp
import json
import jsonpath_ng as jp
# JSON data as string
data = """
{
    "cities": [
        {
            "name": "Trenton",
            "state": "New Jersey",
            "residents": 90048,
            "iscapital": true,
            "neighborhood Central West": {
                "residents": 1394    
            }
        },
        {
            "name": "Hamburg",
            "state": "Hamburg",
            "residents": 1841000,
            "iscapital": false
        },
        {
            "name": "New York City",
            "state": "New York",
            "residents ": 8804190
            "iscapital": false
        },
        {
            "name": "Los Angeles",
            "state": "California",
            "residents": 3898767
        }
    ]
}
"""
# Convert data from String to dictionary
json_data = json.loads(data)
# Inquiry: Names of all cities
query1 = jp.parse("cities[*].name")
for match in query1.find(json_data):
    print(match.value)     # output: Trenton, Hamburg, New York City, Los Angeles
# jsonpath_ng.ext import to apply filters
import jsonpath_ng.ext as jpx
# Anfrage: Names of all cities with less than 1.5 million residents 
query2 = jpx.parse("$.cities[?@.residents < 1500000].name")
for match in query2.find(json_data):
    print(match.value)     # output: Trenton
# All fields labelled ‘residents’ 
query3 = jp.parse("$.cities..residents")
match = query3.find(json_data)
for i in match:
    print(i.value)     # output: 1394, 1841000, 8804190, 3898767
# The names of all cities that are not called ‘Trenton’
query4 = jpx.parse('$.cities[?(@.name != "Trenton")].name')
for match in query4.find(json_data):
    print(match.value)     # output: Hamburg, New York City, Los Angeles
Python

In this example, JSON data is specified as a string and then converted to a dictionary object using loads(). There is only a single array in the root object, which in turn contains 4 cities. Each city has 4 fields that contain the following data:

  • City name
  • State of the city
  • Number of inhabitants
  • Whether the city is the capital or not

New Jersey has as an additional field called ‘Central West’, which also has a number of inhabitants.

After the data has been converted into a suitable format, 4 different queries are executed. Their functions and outputs are left as comments in the example. You might notice that five values came back on the third request. That’s because the ‘..’ operator recursively searches for matching fields. This means all objects are searched, as well as all children of these objects. Accordingly, the number of inhabitants of Central West is listed next to the number of inhabitants of the cities.

Tip

In combination, JSON and Python make a versatile tool for internet programming. If you have a web application that you want to publish quickly, easily, and directly from Git, Deploy Now from IONOS could be an ideal solution.

Was this article helpful?
Page top