Convert JSON to a Pandas Dataframe

This article will discuss how to convert JSON to pandas Dataframe.

JSON stands for JavaScript Object Notation that stores the data in key-value pair format, inside the list/dictionary data structure. A DataFrame is a data structure that keeps the data in rows and columns.

We can convert JSON string or a JSON file to Pandas Dataframe using two techniques,

Table of Contents

Convert JSON to Dataframe using read_json() method

In Python, the Pandas module provides a method read_json() to convert JSON to a Dataframe. It can read the JSON contents from a file or use a JSON string directly and transform them into the dataframe.

Advertisements

Syntax is as follows

pandas.read_json('string/path/file_name', orient)

Parameters:

  • First Parameter : ‘string/path/file_name’
    • It accepts the JSON input as the first parameter. It can be in three formats,
    • 1. A JSON string: It can convert JSON string into pandas dataframe.
    • 2. A path to the JSON file: We can specify the JSON file name along with the path.
    • 3. The JSON file’s name: If the JSON file is in the current directory, we can specify its name only.
  • Second parameter: orient
    • Specify the orientation of the JSON string i.e. the JSON string format.

Now let’s see how we can take JSON input as a string and convert it into a Pandas Dataframe.

Convert record-oriented JSON string into Pandas Dataframe

Here, we will use a JSON string that stores the data as a dictionary of values surrounded by list. For such type of data we will use the ‘records’ orientation.

Syntax:

pandas.read_json('string/path/file_name.json', orient='records')

Example:

In this example, we created the json string with three key-value pairs in 4 rows. Let’s convert this json string to pandas dataframe

import pandas as pd

# Create json data with student details
json_data = '''
[
    { "id": "1", "name": "sravan","age":22 },
    { "id": "2", "name": "harsha","age":22 },
    { "id": "3", "name": "deepika","age":21 },
    { "id": "4", "name": "jyothika","age":23 }
] '''

# Convert JSON string with records orient to a Dataframe
df = pd.read_json(json_data, orient ='records')

# Display the Dataframe
print(df)

Output:

   id      name  age
0   1    sravan   22
1   2    harsha   22
2   3   deepika   21
3   4  jyothika   23

Instead of JSON string as a first parameter, we can also provide the json file path. In that case it will read the JSON string from file and convert it into a Pandas Dataframe.

Convert index oriented JSON string into Pandas Dataframe

Here, we will consider the json string which contains the data in a dictionary of values surrounded by dictionary with key as index. For such type of data we will use the ‘index’ orientation.

Syntax:

pandas.read_json('string/path/file_name.json', orient='index')

Example:

In this example, we created the json string with three key-value pairs in 4 rows with each row specified by index.
Let’s convert this json string to pandas dataframe. we are specifying index as “student-1” to “student-4”

import pandas as pd

# Create JSON data with student details
json_data = '''
{
    "student-1": { "id": "1", "name": "sravan","age":22 },
    "student-2":{ "id": "2", "name": "harsha","age":22 },
    "student-3": { "id": "3", "name": "deepika","age":21 },
    "student-4": { "id": "4", "name": "jyothika","age":23 }
} '''

# Convert JSON file with index orient
df = pd.read_json(  json_data,
                    orient ='index')

# Display the Dataframe
print(df)

Output:

           id      name  age
student-1   1    sravan   22
student-2   2    harsha   22
student-3   3   deepika   21
student-4   4  jyothika   23

Here , the indices will be the rows in the dataframe.

Convert columns oriented json string into Pandas Dataframe

Here, we will consider the json string which contains the data in a dictionary of values surrounded by dictionary with key as index. That index will be the columns in the dataframe. For such type of data we will use the ‘ columns ‘ orientation.

Syntax:

pandas.read_json('string/path/file_name.json', orient='columns')

Example:

In this example, we created the json string with three key-value pairs in 4 rows with each row specified by index.
Let’s convert this json string to pandas dataframe. we are specifying index as “student-1” to “student-4”

import pandas as pd

# Create JSON data with student details
json_data = '''
{
    "student-1": { "id": "1", "name": "sravan","age":22 },
    "student-2":{ "id": "2", "name": "harsha","age":22 },
    "student-3": { "id": "3", "name": "deepika","age":21 },
    "student-4": { "id": "4", "name": "jyothika","age":23 }
}
'''

# Convert JSON file with columns orient
df = pd.read_json(  json_data,
                    orient ='columns')

# Display the Dataframe
print(df)

Output:

     student-1 student-2 student-3 student-4
id           1         2         3         4
name    sravan    harsha   deepika  jyothika
age         22        22        21        23

Here , the column indices will be the columns in the dataframe

Convert values oriented JSON string into Pandas Dataframe

Here, we will consider the json string which contains the data in a list of values surrounded by a list. For such type of data we will use the ‘values ‘ orientation.

Syntax:

pandas.read_json('string/path/file_name.json', orient='values')

Example:

In this example, we created the json string with three key-value pairs in 4. Let’s convert this json string to pandas dataframe.

import pandas as pd

# Create JSON data with student details
json_data = '''
[
    [ "1", "sravan",22 ],
    [ "2", "harsha",22 ],
    [ "3", "deepika",21 ],
    [ "4", "jyothika",23 ]
]
'''

# convert JSON file with values orientation to Dataframe
df = pd.read_json(  json_data,
                    orient ='values')

# Display the Dataframe
print(df)

Output:

   0         1   2
0  1    sravan  22
1  2    harsha  22
2  3   deepika  21
3  4  jyothika  23

Here , the columns and rows by default starts with 0.

Convert JSON to Dataframe using json_normalize()

This method first normalize the json data and then converts into the pandas dataframe. We have to import this method from the pandas module.

Steps are as follows,

Step 1 – Load the json data

We can do this by using json.loads() function. Syntax is as follows,

json.loads(json_string)

Step 2 – Pass the loaded data into json_normalize() method

json_normalize(json.loads(json_string))

Example:

In this example, we will create 4 students details of json string and convert into pandas dataframe.

import pandas as pd
import json

# Create json string
# with student details
json_string = '''
[
    { "id": "1", "name": "sravan","age":22 },
    { "id": "2", "name": "harsha","age":22 },
    { "id": "3", "name": "deepika","age":21 },
    { "id": "4", "name": "jyothika","age":23 }
]
'''

# Load json data and convert to Dataframe 
df = pd.json_normalize(json.loads(json_string)) 

# Display the Dataframe
print(df)

Output:

  id      name  age
0  1    sravan   22
1  2    harsha   22
2  3   deepika   21
3  4  jyothika   23

Here the row positions will starts with 0 by default.

Summary

In this article we used read_json() and json_normalize() methods to convert json to Pandas DataFrame.

Advertisements

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top