Skip to main content

Python Pandas: How to Merge Multiple DataFrames in Pandas

Combining data from multiple DataFrames is one of the most essential operations in data analysis. Whether you are joining customer records with their orders, appending monthly sales reports, or aligning datasets by index, Pandas provides three core methods to accomplish this: merge(), concat(), and join(). This guide explains how each method works, when to use it, and how to apply it to more than two DataFrames at once.

Understanding the Three Methods

Before diving into examples, here is a quick overview of when to use each approach:

MethodCombines OnBest For
merge()Shared column valuesSQL-style joins (inner, left, right, outer) on one or more key columns
concat()Position (rows or columns)Stacking DataFrames vertically or horizontally without matching keys
join()Index valuesCombining DataFrames side by side using their index

Merging DataFrames Using merge()

The merge() function joins two DataFrames based on one or more shared columns, similar to SQL joins. To merge more than two DataFrames, chain multiple merge() calls or use functools.reduce().

Basic Merge on a Common Column

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'age': [25, 30, 35]})

result = pd.merge(df1, df2, on='id', how='inner')
print(result)

Output:

   id     name  age
0 2 Bob 25
1 3 Charlie 30

Only rows with matching id values in both DataFrames are kept (inner join).

Merging Three or More DataFrames

To merge multiple DataFrames, nest merge() calls:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [3, 4, 5], 'age': [25, 30, 35]})
df3 = pd.DataFrame({'id': [5, 6, 7], 'city': ['New York', 'Los Angeles', 'Chicago']})

result = pd.merge(pd.merge(df1, df2, on='id', how='outer'), df3, on='id', how='outer')
print(result)

Output:

   id     name   age         city
0 1 Alice NaN NaN
1 2 Bob NaN NaN
2 3 Charlie 25.0 NaN
3 4 NaN 30.0 NaN
4 5 NaN 35.0 New York
5 6 NaN NaN Los Angeles
6 7 NaN NaN Chicago

The outer join keeps all rows from every DataFrame, filling missing values with NaN.

Cleaner Approach with functools.reduce()

When merging many DataFrames, nesting calls becomes hard to read. Use functools.reduce() instead:

import pandas as pd
from functools import reduce

df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'score': [88, 92, 75]})
df3 = pd.DataFrame({'id': [1, 3, 5], 'city': ['NYC', 'LA', 'Chicago']})

dfs = [df1, df2, df3]
result = reduce(lambda left, right: pd.merge(left, right, on='id', how='outer'), dfs)
print(result)

Output:

   id     name  score     city
0 1 Alice NaN NYC
1 2 Bob 88.0 NaN
2 3 Charlie 92.0 LA
3 4 NaN 75.0 NaN
4 5 NaN NaN Chicago
tip

Use functools.reduce() when merging more than two or three DataFrames. It scales cleanly and avoids deeply nested function calls.

Merge Join Types

The how parameter controls which rows are kept:

Join TypeKeeps
inner (default)Only rows with matching keys in both DataFrames
leftAll rows from the left DataFrame, matching rows from the right
rightAll rows from the right DataFrame, matching rows from the left
outerAll rows from both DataFrames

Concatenating DataFrames Using concat()

The concat() function stacks DataFrames either vertically (adding rows) or horizontally (adding columns). Unlike merge(), it does not require shared key columns - it simply lines up DataFrames by position or index.

Vertical Concatenation (Adding Rows)

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})

result = pd.concat([df1, df2, df3], axis=0, ignore_index=True)
print(result)

Output:

    A   B
0 1 3
1 2 4
2 5 7
3 6 8
4 9 11
5 10 12

Setting ignore_index=True resets the index to a clean 0-based sequence instead of preserving the original indices from each DataFrame.

Vertical Concatenation with Mismatched Columns

When DataFrames have different columns, concat() fills missing values with NaN:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})

result = pd.concat([df1, df2, df3], axis=0, join='outer', ignore_index=True)
print(result)

Output:

    A    B    C     D
0 1 3.0 NaN NaN
1 2 4.0 NaN NaN
2 5 NaN 7.0 NaN
3 6 NaN 8.0 NaN
4 9 NaN NaN 11.0
5 10 NaN NaN 12.0

Use join='inner' to keep only columns that exist in all DataFrames:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})

result_inner = pd.concat([df1, df2, df3], axis=0, join='inner', ignore_index=True)
print(result_inner)

Output:

    A
0 1
1 2
2 5
3 6
4 9
5 10

Horizontal Concatenation (Adding Columns)

Set axis=1 to place DataFrames side by side:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3]})
df2 = pd.DataFrame({'B': [4, 5, 6]})
df3 = pd.DataFrame({'C': [7, 8, 9]})

result = pd.concat([df1, df2, df3], axis=1)
print(result)

Output:

   A  B  C
0 1 4 7
1 2 5 8
2 3 6 9
warning

When using axis=1, Pandas aligns DataFrames by their index. If the indices don't match, you'll get NaN values in the result. Make sure to reset indices with reset_index(drop=True) before concatenating if needed.

Joining DataFrames Using join()

The join() method combines DataFrames based on their index rather than column values. It is essentially a shortcut for index-based merging and supports joining multiple DataFrames in a single call.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]}, index=['a', 'b', 'c'])
df3 = pd.DataFrame({'E': [13, 14, 15], 'F': [16, 17, 18]}, index=['a', 'b', 'c'])

result = df1.join([df2, df3])
print(result)

Output:

   A  B  C   D   E   F
a 1 4 7 10 13 16
b 2 5 8 11 14 17
c 3 6 9 12 15 18

A key advantage of join() is that it accepts a list of DataFrames, making it easy to combine many at once without chaining or nesting.

Joining with Mismatched Indices

When indices don't perfectly overlap, use the how parameter to control which rows are kept:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['b', 'c'])

# Left join (default): keeps all rows from df1
print("Left join:")
print(df1.join(df2, how='left'))

print("\nOuter join:")
print(df1.join(df2, how='outer'))

Output:

Left join:
A B
a 1 NaN
b 2 3.0

Outer join:
A B
a 1.0 NaN
b 2.0 3.0
c NaN 4.0

Common Mistake: Using merge() Without Specifying the Key Column

When two DataFrames share a column name but you forget to specify on, Pandas merges on all shared columns by default, which may not be your intent:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2], 'value': [10, 20]})
df2 = pd.DataFrame({'id': [1, 2], 'value': [30, 40]})

# WRONG: merges on both 'id' AND 'value': likely produces unexpected results
result = pd.merge(df1, df2)
print(result)

Output:

Empty DataFrame
Columns: [id, value]
Index: []

The result is empty because no rows match on both id and value simultaneously.

The correct approach is to explicitly specify which column to join on:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2], 'value': [10, 20]})
df2 = pd.DataFrame({'id': [1, 2], 'value': [30, 40]})

# CORRECT: specify the key column and let Pandas suffix the overlapping columns
result = pd.merge(df1, df2, on='id', suffixes=('_left', '_right'))
print(result)

Output:

   id  value_left  value_right
0 1 10 30
1 2 20 40
danger

Always specify the on parameter when merging DataFrames that share column names beyond the intended key. Without it, Pandas joins on all shared columns, often producing empty or incorrect results.

Quick Reference: Choosing the Right Method

ScenarioMethodExample
Combine on shared column values (SQL-style)merge()pd.merge(df1, df2, on='id')
Stack rows from similar DataFramesconcat() with axis=0pd.concat([df1, df2], axis=0)
Place DataFrames side by sideconcat() with axis=1pd.concat([df1, df2], axis=1)
Combine on index valuesjoin()df1.join([df2, df3])
Merge many DataFrames on a shared keyreduce() + merge()reduce(lambda l, r: pd.merge(l, r, on='id'), dfs)

Each method serves a distinct purpose.

  • Use merge() for key-based joins
  • concat() for positional stacking
  • join() for index-based combinations.

For merging more than two DataFrames, functools.reduce() paired with merge() provides the cleanest solution.