Skip to main content

Python Pandas: How to Merge DataFrames of Different Lengths

When working with real-world data in Python, you'll frequently need to combine DataFrames that have different numbers of rows. For example, you might have a customer table with 1,000 entries and an orders table with 5,000 entries, and you need to bring them together for analysis. Pandas provides several powerful functions to handle this - merge(), concat(), and join() - each suited for different merging strategies.

In this guide, you'll learn how to merge DataFrames of different lengths using various join types, understand when to use each method, and avoid common mistakes that lead to unexpected results.

Understanding Join Types

Before diving into code, it's important to understand the four main join types available in Pandas, as they determine how mismatched rows are handled:

Join TypeBehavior
Inner (default)Keeps only rows with matching keys in both DataFrames
LeftKeeps all rows from the left DataFrame; fills unmatched right rows with NaN
RightKeeps all rows from the right DataFrame; fills unmatched left rows with NaN
OuterKeeps all rows from both DataFrames; fills all unmatched values with NaN

Merging with merge() on a Shared Column

The merge() function combines two DataFrames based on one or more shared columns (keys). It's the most flexible and commonly used approach for merging DataFrames of different lengths.

Setting Up Sample Data

import pandas as pd

df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
'ID': [1, 2, 5],
'Age': [25, 30, 22]
})

print("df1:")
print(df1)
print("\ndf2:")
print(df2)

Output:

df1:
ID Name
0 1 Alice
1 2 Bob
2 3 Charlie
3 4 Diana

df2:
ID Age
0 1 25
1 2 30
2 5 22

Notice that df1 has 4 rows and df2 has 3 rows. IDs 3 and 4 exist only in df1, while ID 5 exists only in df2.

Left Join

A left join keeps all rows from the left DataFrame and brings in matching data from the right:

import pandas as pd

df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
'ID': [1, 2, 5],
'Age': [25, 30, 22]
})

result = df1.merge(df2, how='left', on='ID')
print(result)

Output:

   ID     Name   Age
0 1 Alice 25.0
1 2 Bob 30.0
2 3 Charlie NaN
3 4 Diana NaN

IDs 3 and 4 have no match in df2, so their Age values are filled with NaN.

Right Join

A right join keeps all rows from the right DataFrame:

import pandas as pd

df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
'ID': [1, 2, 5],
'Age': [25, 30, 22]
})


result = df1.merge(df2, how='right', on='ID')
print(result)

Output:

   ID   Name  Age
0 1 Alice 25
1 2 Bob 30
2 5 NaN 22

ID 5 has no match in df1, so its Name value is NaN.

Outer Join

An outer join keeps all rows from both DataFrames:

result = df1.merge(df2, how='outer', on='ID')
print(result)

Output:

   ID     Name   Age
0 1 Alice 25.0
1 2 Bob 30.0
2 3 Charlie NaN
3 4 Diana NaN
4 5 NaN 22.0

Every row from both DataFrames is preserved, with NaN filling in wherever there's no match.

Inner Join (Default)

An inner join keeps only the rows that have matching keys in both DataFrames:

import pandas as pd

df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
'ID': [1, 2, 5],
'Age': [25, 30, 22]
})

result = df1.merge(df2, how='inner', on='ID')
print(result)

Output:

   ID   Name  Age
0 1 Alice 25
1 2 Bob 30

Only IDs 1 and 2 exist in both DataFrames, so the result contains just 2 rows.

tip

When merging DataFrames of different lengths, an inner join will always produce a result with the fewest rows (only matching keys), while an outer join will produce the most rows (all keys from both sides). Choose the join type based on whether you can afford to lose data.

Concatenating with concat()

The concat() function is useful when you want to stack DataFrames either vertically (row-wise) or horizontally (column-wise) without matching on a specific key column.

Column-wise Concatenation (axis=1)

When concatenating along columns, Pandas aligns rows by their index. If the DataFrames have different lengths, missing positions are filled with NaN:

import pandas as pd

df1 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'City': ['New York', 'Boston', 'Chicago']
})

df2 = pd.DataFrame({
'Age': [25, 30],
'Score': [88, 92]
})

result = pd.concat([df1, df2], axis=1)
print(result)

Output:

      Name      City   Age  Score
0 Alice New York 25.0 88.0
1 Bob Boston 30.0 92.0
2 Charlie Chicago NaN NaN
caution

Column-wise concat() aligns by index position, not by any logical key. If your DataFrames don't share a meaningful index, the alignment may produce incorrect pairings. Use merge() instead when you need to match rows by a specific column.

Row-wise Concatenation (axis=0)

When stacking DataFrames vertically, concat() appends rows from one DataFrame below the other:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'Diana', 'Eve'], 'Age': [35, 28, 22]})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

      Name  Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 Diana 28
4 Eve 22
info

Setting ignore_index=True resets the index of the resulting DataFrame to 0, 1, 2, .... Without it, the original indices from each DataFrame are preserved, which can lead to duplicate index values.

Using join() for Index-Based Merging

The join() method merges DataFrames based on their index rather than a column. It's a convenient shorthand when your data is already indexed appropriately:

import pandas as pd

df1 = pd.DataFrame(
{'Name': ['Alice', 'Bob', 'Charlie']},
index=[1, 2, 3]
)

df2 = pd.DataFrame(
{'Age': [25, 30]},
index=[1, 2]
)

result = df1.join(df2, how='left')
print(result)

Output:

      Name   Age
1 Alice 25.0
2 Bob 30.0
3 Charlie NaN

Common Mistake: Unexpected Duplicate Rows After Merge

A frequent issue when merging DataFrames of different lengths is getting more rows than expected due to duplicate keys:

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 2], 'Name': ['Alice', 'Bob', 'Bob2']})
df2 = pd.DataFrame({'ID': [2, 2], 'Score': [85, 90]})

# ❌ This produces a cartesian product for ID=2
result = df1.merge(df2, on='ID', how='inner')
print(result)

Output:

   ID  Name  Score
0 2 Bob 85
1 2 Bob 90
2 2 Bob2 85
3 2 Bob2 90

ID 2 appears twice in df1 and twice in df2, producing 4 rows (2 × 2) instead of the expected 2. This is a many-to-many join.

To avoid this, ensure your key columns are unique, or use validate to catch the issue early:

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 2], 'Name': ['Alice', 'Bob', 'Bob2']})
df2 = pd.DataFrame({'ID': [2, 2], 'Score': [85, 90]})

# ✅ Detect many-to-many joins before they cause problems
try:
result = df1.merge(df2, on='ID', how='inner', validate='one_to_one')
except pd.errors.MergeError as e:
print(f"Merge error: {e}")

Output:

Merge error: Merge keys are not unique in either left or right dataset; not a one-to-one merge

Deduplicate your data before merging, or use validate='many_to_one' or validate='one_to_many' depending on your expected relationship.

Summary

MethodMerges OnBest For
df.merge()Shared columns or indexCombining DataFrames by a logical key (most common)
pd.concat()Index alignmentStacking DataFrames vertically or horizontally
df.join()IndexQuick index-based merging

When merging DataFrames of different lengths, the join type you choose determines how unmatched rows are handled. Use left or outer joins to preserve all data, inner joins to keep only matches, and always watch out for duplicate keys that can silently inflate your result set.

Validate your merge with the validate parameter to catch unexpected many-to-many relationships before they cause downstream issues.