Home  >  Article  >  Backend Development  >  How to Manage Nested JSON Objects as a DataFrame in Pandas?

How to Manage Nested JSON Objects as a DataFrame in Pandas?

DDD
DDDOriginal
2024-10-24 14:07:02874browse

How to Manage Nested JSON Objects as a DataFrame in Pandas?

Reading Nested JSON with Nested Objects as a Pandas DataFrame

When dealing with JSON data containing nested objects, manipulating it efficiently in Python is crucial. Pandas provides a powerful tool to achieve this - json_normalize.

Expanding the Array into Columns

To expand the locations array into separate columns, use json_normalize as follows:

<code class="python">import json
import pandas as pd

with open('myJson.json') as data_file:
    data = json.load(data_file)

df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'], record_prefix='locations_')

print(df)</code>

This will create a dataframe with expanded columns:

  locations_arrTime locations_arrTimeDiffMin locations_depTime  \
0                                                        06:32   
1             06:37                        1             06:40   
2             08:24                        1                     

  locations_depTimeDiffMin           locations_name locations_platform  \
0                        0  Spital am Pyhrn Bahnhof                  2   
1                        0  Windischgarsten Bahnhof                  2   
2                                    Linz/Donau Hbf               1A-B   

  locations_stationIdx locations_track number    name        date  
0                    0          R 3932         R 3932  01.10.2016  
1                    1                         R 3932  01.10.2016  
2                   22                         R 3932  01.10.2016 

Handling Multiple JSON Objects

For JSON files containing multiple objects, the approach depends on the desired data structure.

Keep Individual Columns

To keep individual columns (date, number, name, locations), use the following:

<code class="python">df = pd.read_json('myJson.json')
df.locations = pd.DataFrame(df.locations.values.tolist())['name']
df = df.groupby(['date', 'name', 'number'])['locations'].apply(','.join).reset_index()

print(df)</code>

This will group the data and concatenate the locations:

        date    name number                                          locations
0  2016-01-10  R 3932         Spital am Pyhrn Bahnhof,Windischgarsten Bahnho...

Flatten the Data Structure

If you prefer a flattened data structure, you can use json_normalize with the following settings:

<code class="python">df = pd.read_json('myJson.json', orient='records', convert_dates=['date'])

print(df)</code>

This will output the data in a single table:

  number    date                   name  ... locations.arrTimeDiffMin locations.depTimeDiffMin locations.platform
0             R 3932  2016-01-10  R 3932  ...                       0                         0                  2
1             R 3932  2016-01-10  R 3932  ...                       1                         0                  2
2             R 3932  2016-01-10  R 3932  ...                       1                         -                  1A-B

The above is the detailed content of How to Manage Nested JSON Objects as a DataFrame in Pandas?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn