Python Pandas: How to Merge DataFrames of Different Lengths

When working with real-world data in Python, you'll frequently need to combine DataFrames that have different numbers of rows. For example, you might have a customer table with 1,000 entries and an orders table with 5,000 entries, and you need to bring them together for analysis. Pandas provides several powerful functions to handle this - merge(), concat(), and join() - each suited for different merging strategies.

In this guide, you'll learn how to merge DataFrames of different lengths using various join types, understand when to use each method, and avoid common mistakes that lead to unexpected results.

Understanding Join Types

Before diving into code, it's important to understand the four main join types available in Pandas, as they determine how mismatched rows are handled:

Join Type	Behavior
Inner (default)	Keeps only rows with matching keys in both DataFrames
Left	Keeps all rows from the left DataFrame; fills unmatched right rows with `NaN`
Right	Keeps all rows from the right DataFrame; fills unmatched left rows with `NaN`
Outer	Keeps all rows from both DataFrames; fills all unmatched values with `NaN`

Merging with `merge()` on a Shared Column

The merge() function combines two DataFrames based on one or more shared columns (keys). It's the most flexible and commonly used approach for merging DataFrames of different lengths.

Setting Up Sample Data

import pandas as pd

df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 5],
    'Age': [25, 30, 22]
})

print("df1:")
print(df1)
print("\ndf2:")
print(df2)

Output:

df1:
   ID     Name
0   1    Alice
1   2      Bob
2   3  Charlie
3   4    Diana

df2:
   ID  Age
0   1   25
1   2   30
2   5   22

Notice that df1 has 4 rows and df2 has 3 rows. IDs 3 and 4 exist only in df1, while ID 5 exists only in df2.

Left Join

A left join keeps all rows from the left DataFrame and brings in matching data from the right:

import pandas as pd

df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 5],
    'Age': [25, 30, 22]
})

result = df1.merge(df2, how='left', on='ID')
print(result)

Output:

   ID     Name   Age
 1    Alice  25.0
 2      Bob  30.0
 3  Charlie   NaN
 4    Diana   NaN

IDs 3 and 4 have no match in df2, so their Age values are filled with NaN.

Right Join

A right join keeps all rows from the right DataFrame:

import pandas as pd

df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 5],
    'Age': [25, 30, 22]
})


result = df1.merge(df2, how='right', on='ID')
print(result)

Output:

   ID   Name  Age
 1  Alice   25
 2    Bob   30
 5    NaN   22

ID 5 has no match in df1, so its Name value is NaN.

Outer Join

An outer join keeps all rows from both DataFrames:

result = df1.merge(df2, how='outer', on='ID')
print(result)

Output:

   ID     Name   Age
 1    Alice  25.0
 2      Bob  30.0
 3  Charlie   NaN
 4    Diana   NaN
 5      NaN  22.0

Every row from both DataFrames is preserved, with NaN filling in wherever there's no match.

Inner Join (Default)

An inner join keeps only the rows that have matching keys in both DataFrames:

import pandas as pd

df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 5],
    'Age': [25, 30, 22]
})

result = df1.merge(df2, how='inner', on='ID')
print(result)

Output:

   ID   Name  Age
0   1  Alice   25
1   2    Bob   30

Only IDs 1 and 2 exist in both DataFrames, so the result contains just 2 rows.

tip

When merging DataFrames of different lengths, an inner join will always produce a result with the fewest rows (only matching keys), while an outer join will produce the most rows (all keys from both sides). Choose the join type based on whether you can afford to lose data.

Concatenating with `concat()`

The concat() function is useful when you want to stack DataFrames either vertically (row-wise) or horizontally (column-wise) without matching on a specific key column.

Column-wise Concatenation (`axis=1`)

When concatenating along columns, Pandas aligns rows by their index. If the DataFrames have different lengths, missing positions are filled with NaN:

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'City': ['New York', 'Boston', 'Chicago']
})

df2 = pd.DataFrame({
    'Age': [25, 30],
    'Score': [88, 92]
})

result = pd.concat([df1, df2], axis=1)
print(result)

Output:

      Name      City   Age  Score
  Alice  New York  25.0   88.0
    Bob    Boston  30.0   92.0
Charlie   Chicago   NaN    NaN

warning

Column-wise concat() aligns by index position, not by any logical key. If your DataFrames don't share a meaningful index, the alignment may produce incorrect pairings. Use merge() instead when you need to match rows by a specific column.

Row-wise Concatenation (`axis=0`)

When stacking DataFrames vertically, concat() appends rows from one DataFrame below the other:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'Diana', 'Eve'], 'Age': [35, 28, 22]})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

      Name  Age
  Alice   25
    Bob   30
Charlie   35
  Diana   28
    Eve   22

info

Setting ignore_index=True resets the index of the resulting DataFrame to 0, 1, 2, .... Without it, the original indices from each DataFrame are preserved, which can lead to duplicate index values.

Using `join()` for Index-Based Merging

The join() method merges DataFrames based on their index rather than a column. It's a convenient shorthand when your data is already indexed appropriately:

import pandas as pd

df1 = pd.DataFrame(
    {'Name': ['Alice', 'Bob', 'Charlie']},
    index=[1, 2, 3]
)

df2 = pd.DataFrame(
    {'Age': [25, 30]},
    index=[1, 2]
)

result = df1.join(df2, how='left')
print(result)

Output:

      Name   Age
  Alice  25.0
    Bob  30.0
Charlie   NaN

Common Mistake: Unexpected Duplicate Rows After Merge

A frequent issue when merging DataFrames of different lengths is getting more rows than expected due to duplicate keys:

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 2], 'Name': ['Alice', 'Bob', 'Bob2']})
df2 = pd.DataFrame({'ID': [2, 2], 'Score': [85, 90]})

# ❌ This produces a cartesian product for ID=2
result = df1.merge(df2, on='ID', how='inner')
print(result)

Output:

   ID  Name  Score
 2   Bob     85
 2   Bob     90
 2  Bob2     85
 2  Bob2     90

ID 2 appears twice in df1 and twice in df2, producing 4 rows (2 × 2) instead of the expected 2. This is a many-to-many join.

To avoid this, ensure your key columns are unique, or use validate to catch the issue early:

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 2], 'Name': ['Alice', 'Bob', 'Bob2']})
df2 = pd.DataFrame({'ID': [2, 2], 'Score': [85, 90]})

# ✅ Detect many-to-many joins before they cause problems
try:
    result = df1.merge(df2, on='ID', how='inner', validate='one_to_one')
except pd.errors.MergeError as e:
    print(f"Merge error: {e}")

Output:

Merge error: Merge keys are not unique in either left or right dataset; not a one-to-one merge

Deduplicate your data before merging, or use validate='many_to_one' or validate='one_to_many' depending on your expected relationship.

Summary

Method	Merges On	Best For
`df.merge()`	Shared columns or index	Combining DataFrames by a logical key (most common)
`pd.concat()`	Index alignment	Stacking DataFrames vertically or horizontally
`df.join()`	Index	Quick index-based merging

When merging DataFrames of different lengths, the join type you choose determines how unmatched rows are handled. Use left or outer joins to preserve all data, inner joins to keep only matches, and always watch out for duplicate keys that can silently inflate your result set.

Validate your merge with the validate parameter to catch unexpected many-to-many relationships before they cause downstream issues.

Understanding Join Types​

Merging with merge() on a Shared Column​

Setting Up Sample Data​

Left Join​

Right Join​

Outer Join​

Inner Join (Default)​

Concatenating with concat()​

Column-wise Concatenation (axis=1)​

Row-wise Concatenation (axis=0)​

Using join() for Index-Based Merging​

Common Mistake: Unexpected Duplicate Rows After Merge​

Summary​

Table of Contents

Understanding Join Types

Merging with `merge()` on a Shared Column

Setting Up Sample Data

Left Join

Right Join

Outer Join

Inner Join (Default)

Concatenating with `concat()`

Column-wise Concatenation (`axis=1`)

Row-wise Concatenation (`axis=0`)

Using `join()` for Index-Based Merging

Common Mistake: Unexpected Duplicate Rows After Merge

Summary