Home >Backend Development >Python Tutorial >How Can I Efficiently Create a Pandas DataFrame from a Nested Dictionary with Hierarchical Data?

How Can I Efficiently Create a Pandas DataFrame from a Nested Dictionary with Hierarchical Data?

Linda Hamilton
Linda HamiltonOriginal
2024-12-14 10:58:12542browse

How Can I Efficiently Create a Pandas DataFrame from a Nested Dictionary with Hierarchical Data?

Constructing Pandas DataFrames from Nested Dictionary Items

Given a nested dictionary with a structure featuring a UserId as the top level, Categories as the second level, and various attributes as the third level, the goal is to create a pandas DataFrame with a hierarchical index. Each UserID should appear as an index value, while Category and attribute values form the column names.

Conventional attempts to construct a DataFrame from such a dictionary may result in incorrect index and column assignment. To address this, consider the following approaches:

1. Reshaping the Dictionary:

One solution involves reshaping the dictionary into a format where keys are tuples representing the desired MultiIndex. This allows the use of pd.DataFrame.from_dict with orient='index':

user_dict = {
    12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
         'Category 2': {'att_1': 23, 'att_2': 'another'}},
    15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
         'Category 2': {'att_1': 30, 'att_2': 'bar'}}
}

df = pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
                           for i in user_dict.keys() 
                           for j in user_dict[i].keys()},
                       orient='index')

2. Concatenating DataFrames:

Alternatively, one can build the DataFrame by constructing individual dataframes for each category and user, then concatenating them:

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
    user_ids.append(user_id)
    frames.append(pd.DataFrame.from_dict(d, orient='index'))

df = pd.concat(frames, keys=user_ids)

Both approaches produce a DataFrame with the desired hierarchical index and column structure:

               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

The above is the detailed content of How Can I Efficiently Create a Pandas DataFrame from a Nested Dictionary with Hierarchical Data?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn