Home >Backend Development >Python Tutorial >How Can I Efficiently Concatenate Multiple CSV Files into a Single Pandas DataFrame and Track Data Provenance?
To efficiently combine multiple CSV files into a unified DataFrame, a concise and reliable solution is sought. However, a hurdle has been encountered within the concatenation loop.
To resolve the issue and successfully concatenate the CSV files, the following comprehensive code snippet can be employed:
import os import pandas as pd from pathlib import Path path = r'C:\DRO\DCL_rawdata_files' all_files = Path(path).glob('*.csv') df = pd.concat((pd.read_csv(f) for f in all_files), ignore_index=True)
This code utilizes a generator expression to read each CSV file individually, and then concatenates them into a single DataFrame. The ignore_index parameter ensures that the concatenated DataFrame has continuous row indices.
In certain scenarios, it may be beneficial to add a column to the concatenated DataFrame indicating the source file of each row. This can be achieved using one of the following approaches:
Option 1: Add Filename as a New Column
dfs = [] for f in all_files: data = pd.read_csv(f) data['file'] = f.stem dfs.append(data) df = pd.concat(dfs, ignore_index=True)
Option 2: Add Generic File Source as a New Column
dfs = [] for i, f in enumerate(all_files): data = pd.read_csv(f) data['file'] = f'File {i}' dfs.append(data) df = pd.concat(dfs, ignore_index=True)
Option 3: Add File Source Using List Comprehension
dfs = [pd.read_csv(f) for f in all_files] df = pd.concat(dfs, ignore_index=True) df['Source'] = np.repeat([f'S{i}' for i in range(len(dfs))], [len(df) for df in dfs])
Option 4: Single-Line Solution with .assign()
df = pd.concat((pd.read_csv(f).assign(filename=f.stem) for f in all_files), ignore_index=True)
By implementing one of these options, the concatenated DataFrame will be annotated with information to trace the origin of each row.
The above is the detailed content of How Can I Efficiently Concatenate Multiple CSV Files into a Single Pandas DataFrame and Track Data Provenance?. For more information, please follow other related articles on the PHP Chinese website!