Python Pandas: How to Merge Multiple DataFrames in Pandas

Combining data from multiple DataFrames is one of the most essential operations in data analysis. Whether you are joining customer records with their orders, appending monthly sales reports, or aligning datasets by index, Pandas provides three core methods to accomplish this: merge(), concat(), and join(). This guide explains how each method works, when to use it, and how to apply it to more than two DataFrames at once.

Understanding the Three Methods

Before diving into examples, here is a quick overview of when to use each approach:

Method	Combines On	Best For
`merge()`	Shared column values	SQL-style joins (inner, left, right, outer) on one or more key columns
`concat()`	Position (rows or columns)	Stacking DataFrames vertically or horizontally without matching keys
`join()`	Index values	Combining DataFrames side by side using their index

Merging DataFrames Using `merge()`

The merge() function joins two DataFrames based on one or more shared columns, similar to SQL joins. To merge more than two DataFrames, chain multiple merge() calls or use functools.reduce().

Basic Merge on a Common Column

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'age': [25, 30, 35]})

result = pd.merge(df1, df2, on='id', how='inner')
print(result)

Output:

   id     name  age
0   2      Bob   25
1   3  Charlie   30

Only rows with matching id values in both DataFrames are kept (inner join).

Merging Three or More DataFrames

To merge multiple DataFrames, nest merge() calls:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [3, 4, 5], 'age': [25, 30, 35]})
df3 = pd.DataFrame({'id': [5, 6, 7], 'city': ['New York', 'Los Angeles', 'Chicago']})

result = pd.merge(pd.merge(df1, df2, on='id', how='outer'), df3, on='id', how='outer')
print(result)

Output:

   id     name   age         city
 1    Alice   NaN          NaN
 2      Bob   NaN          NaN
 3  Charlie  25.0          NaN
 4      NaN  30.0          NaN
 5      NaN  35.0     New York
 6      NaN   NaN  Los Angeles
 7      NaN   NaN      Chicago

The outer join keeps all rows from every DataFrame, filling missing values with NaN.

Cleaner Approach with `functools.reduce()`

When merging many DataFrames, nesting calls becomes hard to read. Use functools.reduce() instead:

import pandas as pd
from functools import reduce

df1 = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'id': [2, 3, 4], 'score': [88, 92, 75]})
df3 = pd.DataFrame({'id': [1, 3, 5], 'city': ['NYC', 'LA', 'Chicago']})

dfs = [df1, df2, df3]
result = reduce(lambda left, right: pd.merge(left, right, on='id', how='outer'), dfs)
print(result)

Output:

   id     name  score     city
 1    Alice    NaN      NYC
 2      Bob   88.0      NaN
 3  Charlie   92.0       LA
 4      NaN   75.0      NaN
 5      NaN    NaN  Chicago

tip

Use functools.reduce() when merging more than two or three DataFrames. It scales cleanly and avoids deeply nested function calls.

Merge Join Types

The how parameter controls which rows are kept:

Join Type	Keeps
`inner` (default)	Only rows with matching keys in both DataFrames
`left`	All rows from the left DataFrame, matching rows from the right
`right`	All rows from the right DataFrame, matching rows from the left
`outer`	All rows from both DataFrames

Concatenating DataFrames Using `concat()`

The concat() function stacks DataFrames either vertically (adding rows) or horizontally (adding columns). Unlike merge(), it does not require shared key columns - it simply lines up DataFrames by position or index.

Vertical Concatenation (Adding Rows)

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})

result = pd.concat([df1, df2, df3], axis=0, ignore_index=True)
print(result)

Output:

Setting ignore_index=True resets the index to a clean 0-based sequence instead of preserving the original indices from each DataFrame.

Vertical Concatenation with Mismatched Columns

When DataFrames have different columns, concat() fills missing values with NaN:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})

result = pd.concat([df1, df2, df3], axis=0, join='outer', ignore_index=True)
print(result)

Output:

    A    B    C     D
 1  3.0  NaN   NaN
 2  4.0  NaN   NaN
 5  NaN  7.0   NaN
 6  NaN  8.0   NaN
 9  NaN  NaN  11.0
10  NaN  NaN  12.0

Use join='inner' to keep only columns that exist in all DataFrames:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]})

result_inner = pd.concat([df1, df2, df3], axis=0, join='inner', ignore_index=True)
print(result_inner)

Output:

Horizontal Concatenation (Adding Columns)

Set axis=1 to place DataFrames side by side:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3]})
df2 = pd.DataFrame({'B': [4, 5, 6]})
df3 = pd.DataFrame({'C': [7, 8, 9]})

result = pd.concat([df1, df2, df3], axis=1)
print(result)

Output:

warning

When using axis=1, Pandas aligns DataFrames by their index. If the indices don't match, you'll get NaN values in the result. Make sure to reset indices with reset_index(drop=True) before concatenating if needed.

Joining DataFrames Using `join()`

The join() method combines DataFrames based on their index rather than column values. It is essentially a shortcut for index-based merging and supports joining multiple DataFrames in a single call.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]}, index=['a', 'b', 'c'])
df3 = pd.DataFrame({'E': [13, 14, 15], 'F': [16, 17, 18]}, index=['a', 'b', 'c'])

result = df1.join([df2, df3])
print(result)

Output:

   A  B  C   D   E   F
a  1  4  7  10  13  16
b  2  5  8  11  14  17
c  3  6  9  12  15  18

A key advantage of join() is that it accepts a list of DataFrames, making it easy to combine many at once without chaining or nesting.

Joining with Mismatched Indices

When indices don't perfectly overlap, use the how parameter to control which rows are kept:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['b', 'c'])

# Left join (default): keeps all rows from df1
print("Left join:")
print(df1.join(df2, how='left'))

print("\nOuter join:")
print(df1.join(df2, how='outer'))

Output:

Left join:
   A    B
a  1  NaN
b  2  3.0

Outer join:
     A    B
a  1.0  NaN
b  2.0  3.0
c  NaN  4.0

Common Mistake: Using `merge()` Without Specifying the Key Column

When two DataFrames share a column name but you forget to specify on, Pandas merges on all shared columns by default, which may not be your intent:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2], 'value': [10, 20]})
df2 = pd.DataFrame({'id': [1, 2], 'value': [30, 40]})

# WRONG: merges on both 'id' AND 'value': likely produces unexpected results
result = pd.merge(df1, df2)
print(result)

Output:

Empty DataFrame
Columns: [id, value]
Index: []

The result is empty because no rows match on both id and value simultaneously.

The correct approach is to explicitly specify which column to join on:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2], 'value': [10, 20]})
df2 = pd.DataFrame({'id': [1, 2], 'value': [30, 40]})

# CORRECT: specify the key column and let Pandas suffix the overlapping columns
result = pd.merge(df1, df2, on='id', suffixes=('_left', '_right'))
print(result)

Output:

   id  value_left  value_right
0   1          10           30
1   2          20           40

danger

Always specify the on parameter when merging DataFrames that share column names beyond the intended key. Without it, Pandas joins on all shared columns, often producing empty or incorrect results.

Quick Reference: Choosing the Right Method

Scenario	Method	Example
Combine on shared column values (SQL-style)	`merge()`	`pd.merge(df1, df2, on='id')`
Stack rows from similar DataFrames	`concat()` with `axis=0`	`pd.concat([df1, df2], axis=0)`
Place DataFrames side by side	`concat()` with `axis=1`	`pd.concat([df1, df2], axis=1)`
Combine on index values	`join()`	`df1.join([df2, df3])`
Merge many DataFrames on a shared key	`reduce()` + `merge()`	`reduce(lambda l, r: pd.merge(l, r, on='id'), dfs)`

Each method serves a distinct purpose.

Use merge() for key-based joins
concat() for positional stacking
join() for index-based combinations.

For merging more than two DataFrames, functools.reduce() paired with merge() provides the cleanest solution.

Understanding the Three Methods​

Merging DataFrames Using merge()​

Basic Merge on a Common Column​

Merging Three or More DataFrames​

Cleaner Approach with functools.reduce()​

Merge Join Types​

Concatenating DataFrames Using concat()​

Vertical Concatenation (Adding Rows)​

Vertical Concatenation with Mismatched Columns​

Horizontal Concatenation (Adding Columns)​

Joining DataFrames Using join()​

Joining with Mismatched Indices​

Common Mistake: Using merge() Without Specifying the Key Column​

Quick Reference: Choosing the Right Method​

Table of Contents

Understanding the Three Methods

Merging DataFrames Using `merge()`

Basic Merge on a Common Column

Merging Three or More DataFrames

Cleaner Approach with `functools.reduce()`

Merge Join Types

Concatenating DataFrames Using `concat()`

Vertical Concatenation (Adding Rows)

Vertical Concatenation with Mismatched Columns

Horizontal Concatenation (Adding Columns)

Joining DataFrames Using `join()`

Joining with Mismatched Indices

Common Mistake: Using `merge()` Without Specifying the Key Column

Quick Reference: Choosing the Right Method