Python Pandas: How to Merge Multiple DataFrames in Pandas
Combining data from multiple DataFrames is one of the most essential operations in data analysis. Whether you are joining customer records with their orders, appending monthly sales reports, or aligning datasets by index, Pandas provides three core methods to accomplish this: merge(), concat(), and join(). This guide explains how each method works, when to use it, and how to apply it to more than two DataFrames at once.
Understanding the Three Methods
Before diving into examples, here is a quick overview of when to use each approach:
| Method | Combines On | Best For |
|---|---|---|
merge() | Shared column values | SQL-style joins (inner, left, right, outer) on one or more key columns |
concat() | Position (rows or columns) | Stacking DataFrames vertically or horizontally without matching keys |
join() | Index values | Combining DataFrames side by side using their index |
Merging DataFrames Using merge()
The merge() function joins two DataFrames based on one or more shared columns, similar to SQL joins. To merge more than two DataFrames, chain multiple merge() calls or use functools.reduce().
Basic Merge on a Common Column
import pandas as pd
df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'age': [25, 30, 35]})
result = pd.merge(df1, df2, on='id', how='inner')
print(result)
Output:
id name age
0 2 Bob 25
1 3 Charlie 30
Only rows with matching id values in both DataFrames are kept (inner join).
Merging Three or More DataFrames
To merge multiple DataFrames, nest merge() calls:
import pandas as pd
df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [3, 4, 5], 'age': [25, 30, 35]})
df3 = pd.DataFrame({'id': [5, 6, 7], 'city': ['New York', 'Los Angeles', 'Chicago']})
result = pd.merge(pd.merge(df1, df2, on='id', how='outer'), df3, on='id', how='outer')
print(result)
Output:
id name age city
0 1 Alice NaN NaN
1 2 Bob NaN NaN
2 3 Charlie 25.0 NaN
3 4 NaN 30.0 NaN
4 5 NaN 35.0 New York
5 6 NaN NaN Los Angeles
6 7 NaN NaN Chicago
The outer join keeps all rows from every DataFrame, filling missing values with NaN.
Cleaner Approach with functools.reduce()
When merging many DataFrames, nesting calls becomes hard to read. Use functools.reduce() instead:
import pandas as pd
from functools import reduce
df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'score': [88, 92, 75]})
df3 = pd.DataFrame({'id': [1, 3, 5], 'city': ['NYC', 'LA', 'Chicago']})
dfs = [df1, df2, df3]
result = reduce(lambda left, right: pd.merge(left, right, on='id', how='outer'), dfs)
print(result)
Output:
id name score city
0 1 Alice NaN NYC
1 2 Bob 88.0 NaN
2 3 Charlie 92.0 LA
3 4 NaN 75.0 NaN
4 5 NaN NaN Chicago
Use functools.reduce() when merging more than two or three DataFrames. It scales cleanly and avoids deeply nested function calls.
Merge Join Types
The how parameter controls which rows are kept:
| Join Type | Keeps |
|---|---|
inner (default) | Only rows with matching keys in both DataFrames |
left | All rows from the left DataFrame, matching rows from the right |
right | All rows from the right DataFrame, matching rows from the left |
outer | All rows from both DataFrames |
Concatenating DataFrames Using concat()
The concat() function stacks DataFrames either vertically (adding rows) or horizontally (adding columns). Unlike merge(), it does not require shared key columns - it simply lines up DataFrames by position or index.
Vertical Concatenation (Adding Rows)
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})
result = pd.concat([df1, df2, df3], axis=0, ignore_index=True)
print(result)
Output:
A B
0 1 3
1 2 4
2 5 7
3 6 8
4 9 11
5 10 12
Setting ignore_index=True resets the index to a clean 0-based sequence instead of preserving the original indices from each DataFrame.
Vertical Concatenation with Mismatched Columns
When DataFrames have different columns, concat() fills missing values with NaN:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})
result = pd.concat([df1, df2, df3], axis=0, join='outer', ignore_index=True)
print(result)
Output:
A B C D
0 1 3.0 NaN NaN
1 2 4.0 NaN NaN
2 5 NaN 7.0 NaN
3 6 NaN 8.0 NaN
4 9 NaN NaN 11.0
5 10 NaN NaN 12.0
Use join='inner' to keep only columns that exist in all DataFrames:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})
result_inner = pd.concat([df1, df2, df3], axis=0, join='inner', ignore_index=True)
print(result_inner)
Output:
A
0 1
1 2
2 5
3 6
4 9
5 10
Horizontal Concatenation (Adding Columns)
Set axis=1 to place DataFrames side by side:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3]})
df2 = pd.DataFrame({'B': [4, 5, 6]})
df3 = pd.DataFrame({'C': [7, 8, 9]})
result = pd.concat([df1, df2, df3], axis=1)
print(result)
Output:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
When using axis=1, Pandas aligns DataFrames by their index. If the indices don't match, you'll get NaN values in the result. Make sure to reset indices with reset_index(drop=True) before concatenating if needed.
Joining DataFrames Using join()
The join() method combines DataFrames based on their index rather than column values. It is essentially a shortcut for index-based merging and supports joining multiple DataFrames in a single call.
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]}, index=['a', 'b', 'c'])
df3 = pd.DataFrame({'E': [13, 14, 15], 'F': [16, 17, 18]}, index=['a', 'b', 'c'])
result = df1.join([df2, df3])
print(result)
Output:
A B C D E F
a 1 4 7 10 13 16
b 2 5 8 11 14 17
c 3 6 9 12 15 18
A key advantage of join() is that it accepts a list of DataFrames, making it easy to combine many at once without chaining or nesting.
Joining with Mismatched Indices
When indices don't perfectly overlap, use the how parameter to control which rows are kept:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['b', 'c'])
# Left join (default): keeps all rows from df1
print("Left join:")
print(df1.join(df2, how='left'))
print("\nOuter join:")
print(df1.join(df2, how='outer'))
Output:
Left join:
A B
a 1 NaN
b 2 3.0
Outer join:
A B
a 1.0 NaN
b 2.0 3.0
c NaN 4.0
Common Mistake: Using merge() Without Specifying the Key Column
When two DataFrames share a column name but you forget to specify on, Pandas merges on all shared columns by default, which may not be your intent:
import pandas as pd
df1 = pd.DataFrame({'id': [1, 2], 'value': [10, 20]})
df2 = pd.DataFrame({'id': [1, 2], 'value': [30, 40]})
# WRONG: merges on both 'id' AND 'value': likely produces unexpected results
result = pd.merge(df1, df2)
print(result)
Output:
Empty DataFrame
Columns: [id, value]
Index: []
The result is empty because no rows match on both id and value simultaneously.
The correct approach is to explicitly specify which column to join on:
import pandas as pd
df1 = pd.DataFrame({'id': [1, 2], 'value': [10, 20]})
df2 = pd.DataFrame({'id': [1, 2], 'value': [30, 40]})
# CORRECT: specify the key column and let Pandas suffix the overlapping columns
result = pd.merge(df1, df2, on='id', suffixes=('_left', '_right'))
print(result)
Output:
id value_left value_right
0 1 10 30
1 2 20 40
Always specify the on parameter when merging DataFrames that share column names beyond the intended key. Without it, Pandas joins on all shared columns, often producing empty or incorrect results.
Quick Reference: Choosing the Right Method
| Scenario | Method | Example |
|---|---|---|
| Combine on shared column values (SQL-style) | merge() | pd.merge(df1, df2, on='id') |
| Stack rows from similar DataFrames | concat() with axis=0 | pd.concat([df1, df2], axis=0) |
| Place DataFrames side by side | concat() with axis=1 | pd.concat([df1, df2], axis=1) |
| Combine on index values | join() | df1.join([df2, df3]) |
| Merge many DataFrames on a shared key | reduce() + merge() | reduce(lambda l, r: pd.merge(l, r, on='id'), dfs) |
Each method serves a distinct purpose.
- Use
merge()for key-based joins concat()for positional stackingjoin()for index-based combinations.
For merging more than two DataFrames, functools.reduce() paired with merge() provides the cleanest solution.