Skip to main content

How to Flatten a List of DataFrames in Python

When working with pandas in Python, you frequently encounter scenarios where data is stored as a list of DataFrames - for example, when reading multiple CSV files, splitting data by groups, or collecting results from batch operations. Flattening these into a single DataFrame makes the data easier to analyze, query, and export.

This guide covers how to flatten (combine) a list of separate DataFrames into one, as well as how to flatten list-like values stored within DataFrame columns.

Combining a List of DataFrames into One

The most common "flattening" task is merging multiple DataFrames into a single DataFrame. The pd.concat() function is the standard and most efficient way to do this.

import pandas as pd

# Simulate a list of DataFrames (e.g., from reading multiple files)
df1 = pd.DataFrame({"name": ["Alice", "Bob"], "score": [85, 92]})
df2 = pd.DataFrame({"name": ["Charlie", "Diana"], "score": [78, 95]})
df3 = pd.DataFrame({"name": ["Eve"], "score": [88]})

list_of_dfs = [df1, df2, df3]

# Flatten into a single DataFrame
combined = pd.concat(list_of_dfs, ignore_index=True)
print(combined)

Output:

      name  score
0 Alice 85
1 Bob 92
2 Charlie 78
3 Diana 95
4 Eve 88
  • pd.concat(list_of_dfs) stacks all DataFrames vertically (row-wise) by default.
  • ignore_index=True resets the index to a continuous sequence (0, 1, 2, ...) instead of preserving original indices.
When to use ignore_index=True

If each DataFrame has its own index (e.g., all starting from 0), the combined DataFrame will have duplicate index values without ignore_index=True:

# Without ignore_index: duplicate indices
combined = pd.concat(list_of_dfs)
print(combined.index.tolist())
# Output: [0, 1, 0, 1, 0]

# With ignore_index: clean sequential index
combined = pd.concat(list_of_dfs, ignore_index=True)
print(combined.index.tolist())
# Output: [0, 1, 2, 3, 4]

Handling DataFrames with Different Columns

If the DataFrames have different columns, pd.concat() aligns them and fills missing values with NaN:

import pandas as pd

df1 = pd.DataFrame({"name": ["Alice"], "score": [85]})
df2 = pd.DataFrame({"name": ["Bob"], "grade": ["A"]})

combined = pd.concat([df1, df2], ignore_index=True)
print(combined)

Output:

    name  score grade
0 Alice 85.0 NaN
1 Bob NaN A
note

To only keep columns that appear in all DataFrames, use join="inner":

combined = pd.concat([df1, df2], ignore_index=True, join="inner")
print(combined)

Output:

    name
0 Alice
1 Bob

Flattening List Values Inside DataFrame Columns

A different type of flattening involves expanding list-like values stored within a column so that each element gets its own row.

The explode() method (available in pandas 0.25+) is the simplest way to flatten a column containing lists:

import pandas as pd

df = pd.DataFrame({
"payments": [[300, 400, 500, 600], [300, 322, 333, 233]],
"name": ["sravan", "bobby"]
})

print("Original DataFrame:")
print(df)
print()

# Flatten the 'payments' column
flattened = df.explode("payments", ignore_index=True)

print("Flattened DataFrame:")
print(flattened)

Output:

Original DataFrame:
payments name
0 [300, 400, 500, 600] sravan
1 [300, 322, 333, 233] bobby

Flattened DataFrame:
payments name
0 300 sravan
1 400 sravan
2 500 sravan
3 600 sravan
4 300 bobby
5 322 bobby
6 333 bobby
7 233 bobby

Each element in the payments list gets its own row, and the corresponding name value is repeated for each entry.

Exploding Multiple Columns

Starting with pandas 1.3.0, you can explode multiple columns simultaneously (they must have lists of the same length in each row):

import pandas as pd

df = pd.DataFrame({
"months": [["Jan", "Feb", "Mar"], ["Jan", "Feb", "Mar"]],
"payments": [[300, 400, 500], [100, 200, 150]],
"name": ["Alice", "Bob"]
})

flattened = df.explode(["months", "payments"], ignore_index=True)
print(flattened)

Output:

  months payments   name
0 Jan 300 Alice
1 Feb 400 Alice
2 Mar 500 Alice
3 Jan 100 Bob
4 Feb 200 Bob
5 Mar 150 Bob

Manual Approach (Pre-pandas 0.25)

If you're working with an older version of pandas that doesn't have explode(), you can flatten list columns manually using iteration:

import pandas as pd

df = pd.DataFrame({
"payments": [[300, 400, 500, 600], [300, 322, 333, 233]],
"name": ["sravan", "bobby"]
})

# Manual flattening
flat_data = pd.DataFrame(
[(idx, value) for idx, values in df["payments"].items() for value in values],
columns=["index", "payments"]
).set_index("index")

result = df.drop("payments", axis=1).join(flat_data)
print(result)

Output:

     name  payments
0 sravan 300
0 sravan 400
0 sravan 500
0 sravan 600
1 bobby 300
1 bobby 322
1 bobby 333
1 bobby 233
warning

The manual approach is more verbose and harder to maintain. Use explode() whenever possible - it's faster, cleaner, and handles edge cases like empty lists and NaN values automatically.

Flattening a DataFrame's Values to a 1D Array

If you need to flatten an entire DataFrame into a one-dimensional array (all values in a single flat sequence), use NumPy's .flatten() on the underlying values:

import pandas as pd

df = pd.DataFrame({
"A": [1, 2, 3],
"B": [4, 5, 6]
})

flat_array = df.values.flatten()
print(flat_array)
print(type(flat_array))

Output:

[1 4 2 5 3 6]
<class 'numpy.ndarray'>
note

To control the flattening order:

# Row-major order (default): row by row
print(df.values.flatten(order="C")) # [1 4 2 5 3 6]

# Column-major order: column by column
print(df.values.flatten(order="F")) # [1 2 3 4 5 6]

Practical Example: Reading and Combining Multiple CSV Files

A real-world use case for flattening a list of DataFrames is loading multiple files into a single dataset:

import pandas as pd
from pathlib import Path

# Read all CSV files from a directory
data_dir = Path("data/reports/")
csv_files = list(data_dir.glob("*.csv"))

# Load each file into a DataFrame
dfs = [pd.read_csv(file) for file in csv_files]

# Flatten into a single DataFrame
combined = pd.concat(dfs, ignore_index=True)

print(f"Loaded {len(csv_files)} files with {len(combined)} total rows")
print(combined.head())
tip

For very large numbers of files, consider adding a source column to track which file each row came from:

dfs = []
for file in csv_files:
df = pd.read_csv(file)
df["source_file"] = file.name
dfs.append(df)

combined = pd.concat(dfs, ignore_index=True)

Quick Reference

TaskMethodBest For
Combine list of DataFramespd.concat(dfs, ignore_index=True)Merging multiple DataFrames
Flatten list column to rowsdf.explode("column")Expanding list values (pandas 0.25+)
Flatten multiple list columnsdf.explode(["col1", "col2"])Parallel list expansion (pandas 1.3+)
Flatten all values to 1D arraydf.values.flatten()Converting entire DataFrame to array

Conclusion

Flattening DataFrames in Python typically falls into two categories:

  • combining multiple DataFrames into one using pd.concat(),
  • or expanding list-like values within columns using explode().

Both operations are essential for data preparation and analysis. Use pd.concat() with ignore_index=True for clean merging, and explode() for transforming nested list data into a flat, row-per-value format that's ready for analysis.