How to Flatten a List of DataFrames in Python
When working with pandas in Python, you frequently encounter scenarios where data is stored as a list of DataFrames - for example, when reading multiple CSV files, splitting data by groups, or collecting results from batch operations. Flattening these into a single DataFrame makes the data easier to analyze, query, and export.
This guide covers how to flatten (combine) a list of separate DataFrames into one, as well as how to flatten list-like values stored within DataFrame columns.
Combining a List of DataFrames into One
The most common "flattening" task is merging multiple DataFrames into a single DataFrame. The pd.concat() function is the standard and most efficient way to do this.
Using pd.concat() (Recommended)
import pandas as pd
# Simulate a list of DataFrames (e.g., from reading multiple files)
df1 = pd.DataFrame({"name": ["Alice", "Bob"], "score": [85, 92]})
df2 = pd.DataFrame({"name": ["Charlie", "Diana"], "score": [78, 95]})
df3 = pd.DataFrame({"name": ["Eve"], "score": [88]})
list_of_dfs = [df1, df2, df3]
# Flatten into a single DataFrame
combined = pd.concat(list_of_dfs, ignore_index=True)
print(combined)
Output:
name score
0 Alice 85
1 Bob 92
2 Charlie 78
3 Diana 95
4 Eve 88
pd.concat(list_of_dfs)stacks all DataFrames vertically (row-wise) by default.ignore_index=Trueresets the index to a continuous sequence (0, 1, 2, ...) instead of preserving original indices.
ignore_index=TrueIf each DataFrame has its own index (e.g., all starting from 0), the combined DataFrame will have duplicate index values without ignore_index=True:
# Without ignore_index: duplicate indices
combined = pd.concat(list_of_dfs)
print(combined.index.tolist())
# Output: [0, 1, 0, 1, 0]
# With ignore_index: clean sequential index
combined = pd.concat(list_of_dfs, ignore_index=True)
print(combined.index.tolist())
# Output: [0, 1, 2, 3, 4]
Handling DataFrames with Different Columns
If the DataFrames have different columns, pd.concat() aligns them and fills missing values with NaN:
import pandas as pd
df1 = pd.DataFrame({"name": ["Alice"], "score": [85]})
df2 = pd.DataFrame({"name": ["Bob"], "grade": ["A"]})
combined = pd.concat([df1, df2], ignore_index=True)
print(combined)
Output:
name score grade
0 Alice 85.0 NaN
1 Bob NaN A
To only keep columns that appear in all DataFrames, use join="inner":
combined = pd.concat([df1, df2], ignore_index=True, join="inner")
print(combined)
Output:
name
0 Alice
1 Bob
Flattening List Values Inside DataFrame Columns
A different type of flattening involves expanding list-like values stored within a column so that each element gets its own row.
Using explode() (Recommended)
The explode() method (available in pandas 0.25+) is the simplest way to flatten a column containing lists:
import pandas as pd
df = pd.DataFrame({
"payments": [[300, 400, 500, 600], [300, 322, 333, 233]],
"name": ["sravan", "bobby"]
})
print("Original DataFrame:")
print(df)
print()
# Flatten the 'payments' column
flattened = df.explode("payments", ignore_index=True)
print("Flattened DataFrame:")
print(flattened)
Output:
Original DataFrame:
payments name
0 [300, 400, 500, 600] sravan
1 [300, 322, 333, 233] bobby
Flattened DataFrame:
payments name
0 300 sravan
1 400 sravan
2 500 sravan
3 600 sravan
4 300 bobby
5 322 bobby
6 333 bobby
7 233 bobby
Each element in the payments list gets its own row, and the corresponding name value is repeated for each entry.
Exploding Multiple Columns
Starting with pandas 1.3.0, you can explode multiple columns simultaneously (they must have lists of the same length in each row):
import pandas as pd
df = pd.DataFrame({
"months": [["Jan", "Feb", "Mar"], ["Jan", "Feb", "Mar"]],
"payments": [[300, 400, 500], [100, 200, 150]],
"name": ["Alice", "Bob"]
})
flattened = df.explode(["months", "payments"], ignore_index=True)
print(flattened)
Output:
months payments name
0 Jan 300 Alice
1 Feb 400 Alice
2 Mar 500 Alice
3 Jan 100 Bob
4 Feb 200 Bob
5 Mar 150 Bob
Manual Approach (Pre-pandas 0.25)
If you're working with an older version of pandas that doesn't have explode(), you can flatten list columns manually using iteration:
import pandas as pd
df = pd.DataFrame({
"payments": [[300, 400, 500, 600], [300, 322, 333, 233]],
"name": ["sravan", "bobby"]
})
# Manual flattening
flat_data = pd.DataFrame(
[(idx, value) for idx, values in df["payments"].items() for value in values],
columns=["index", "payments"]
).set_index("index")
result = df.drop("payments", axis=1).join(flat_data)
print(result)
Output:
name payments
0 sravan 300
0 sravan 400
0 sravan 500
0 sravan 600
1 bobby 300
1 bobby 322
1 bobby 333
1 bobby 233
The manual approach is more verbose and harder to maintain. Use explode() whenever possible - it's faster, cleaner, and handles edge cases like empty lists and NaN values automatically.
Flattening a DataFrame's Values to a 1D Array
If you need to flatten an entire DataFrame into a one-dimensional array (all values in a single flat sequence), use NumPy's .flatten() on the underlying values:
import pandas as pd
df = pd.DataFrame({
"A": [1, 2, 3],
"B": [4, 5, 6]
})
flat_array = df.values.flatten()
print(flat_array)
print(type(flat_array))
Output:
[1 4 2 5 3 6]
<class 'numpy.ndarray'>
To control the flattening order:
# Row-major order (default): row by row
print(df.values.flatten(order="C")) # [1 4 2 5 3 6]
# Column-major order: column by column
print(df.values.flatten(order="F")) # [1 2 3 4 5 6]
Practical Example: Reading and Combining Multiple CSV Files
A real-world use case for flattening a list of DataFrames is loading multiple files into a single dataset:
import pandas as pd
from pathlib import Path
# Read all CSV files from a directory
data_dir = Path("data/reports/")
csv_files = list(data_dir.glob("*.csv"))
# Load each file into a DataFrame
dfs = [pd.read_csv(file) for file in csv_files]
# Flatten into a single DataFrame
combined = pd.concat(dfs, ignore_index=True)
print(f"Loaded {len(csv_files)} files with {len(combined)} total rows")
print(combined.head())
For very large numbers of files, consider adding a source column to track which file each row came from:
dfs = []
for file in csv_files:
df = pd.read_csv(file)
df["source_file"] = file.name
dfs.append(df)
combined = pd.concat(dfs, ignore_index=True)
Quick Reference
| Task | Method | Best For |
|---|---|---|
| Combine list of DataFrames | pd.concat(dfs, ignore_index=True) | Merging multiple DataFrames |
| Flatten list column to rows | df.explode("column") | Expanding list values (pandas 0.25+) |
| Flatten multiple list columns | df.explode(["col1", "col2"]) | Parallel list expansion (pandas 1.3+) |
| Flatten all values to 1D array | df.values.flatten() | Converting entire DataFrame to array |
Conclusion
Flattening DataFrames in Python typically falls into two categories:
- combining multiple DataFrames into one using
pd.concat(), - or expanding list-like values within columns using
explode().
Both operations are essential for data preparation and analysis. Use pd.concat() with ignore_index=True for clean merging, and explode() for transforming nested list data into a flat, row-per-value format that's ready for analysis.