Skip to main content

Python Pandas: How to Combine Two DataFrames in Pandas

When working with real-world data, information is often spread across multiple sources - one DataFrame might contain customer profiles while another holds their transaction history. To analyze this data together, you need to combine these DataFrames into a single, unified structure.

Pandas provides two primary methods for combining DataFrames: concat() for stacking data vertically or horizontally, and merge() for joining data based on shared columns (similar to SQL joins).

In this guide, you'll learn how to use both methods effectively, understand their differences, and know when to choose one over the other.

Combining DataFrames with concat()

The concat() function combines DataFrames by stacking them - either adding rows on top of each other (vertically) or placing columns side by side (horizontally).

Stacking DataFrames Vertically (Adding Rows)

Vertical concatenation is the most common use of concat(). It appends one DataFrame's rows below another's:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})

combined = pd.concat([df1, df2])
print(combined)

Output:

      Name  Age
0 Alice 25
1 Bob 30
0 Charlie 35
1 David 40

Notice that the index values are duplicated (both DataFrames have indices 0 and 1). To reset the index and get a clean sequential order, use ignore_index=True:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})

combined = pd.concat([df1, df2], ignore_index=True)
print(combined)

Output:

      Name  Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 David 40
tip

Always use ignore_index=True when stacking DataFrames vertically unless you specifically need to preserve the original indices. Duplicate indices can cause confusion and unexpected behavior in later operations.

Stacking DataFrames Horizontally (Adding Columns)

To place DataFrames side by side, set axis=1:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'City': ['New York', 'Los Angeles'], 'Salary': [70000, 80000]})

combined = pd.concat([df1, df2], axis=1)
print(combined)

Output:

    Name  Age         City  Salary
0 Alice 25 New York 70000
1 Bob 30 Los Angeles 80000
Important: Horizontal Concatenation Aligns by Index

When concatenating horizontally, pandas aligns rows by their index values, not by position. If the indices don't match, you'll get NaN values:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob']}, index=[0, 1])
df2 = pd.DataFrame({'City': ['New York', 'Chicago']}, index=[1, 2])

combined = pd.concat([df1, df2], axis=1)
print(combined)

Output:

    Name      City
0 Alice NaN
1 Bob New York
2 NaN Chicago

Make sure your indices align properly, or reset them with reset_index() before concatenating.

Handling Mismatched Columns

When vertically stacking DataFrames with different columns, concat() fills missing values with NaN:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie'], 'Salary': [50000]})

combined = pd.concat([df1, df2], ignore_index=True)
print(combined)

Output:

      Name   Age   Salary
0 Alice 25.0 NaN
1 Bob 30.0 NaN
2 Charlie NaN 50000.0

Combining DataFrames with merge()

The merge() function combines DataFrames based on shared column values, similar to SQL JOIN operations. This is the right tool when you need to match rows from two DataFrames based on a common key.

Inner Join (Default)

An inner join keeps only the rows where the key column has matching values in both DataFrames:

import pandas as pd

df1 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'David'],
'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name')
print(merged)

Output:

    Name  Age  Salary
0 Alice 25 50000
1 Bob 30 60000

Only Alice and Bob appear because they exist in both DataFrames. Charlie (only in df1) and David (only in df2) are excluded.

Outer Join

An outer join keeps all rows from both DataFrames, filling NaN where there's no match:

import pandas as pd

df1 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'David'],
'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name', how='outer')
print(merged)

Output:

      Name   Age   Salary
0 Alice 25.0 50000.0
1 Bob 30.0 60000.0
2 Charlie 35.0 NaN
3 David NaN 70000.0

Left Join

A left join keeps all rows from the left DataFrame and adds matching data from the right:

import pandas as pd

df1 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'David'],
'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name', how='left')
print(merged)

Output:

      Name  Age   Salary
0 Alice 25 50000.0
1 Bob 30 60000.0
2 Charlie 35 NaN

All three names from df1 are preserved. Charlie has NaN for Salary because there's no match in df2.

Right Join

A right join keeps all rows from the right DataFrame and adds matching data from the left:

import pandas as pd

df1 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'David'],
'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name', how='right')
print(merged)

Output:

    Name   Age  Salary
0 Alice 25.0 50000
1 Bob 30.0 60000
2 David NaN 70000

Merging on Different Column Names

When the key columns have different names in each DataFrame, use left_on and right_on:

import pandas as pd

employees = pd.DataFrame({
'emp_name': ['Alice', 'Bob', 'Charlie'],
'department': ['Engineering', 'Marketing', 'Sales']
})

salaries = pd.DataFrame({
'worker': ['Alice', 'Bob', 'David'],
'salary': [90000, 75000, 65000]
})

merged = pd.merge(employees, salaries, left_on='emp_name', right_on='worker')
print(merged)

Output:

  emp_name   department worker  salary
0 Alice Engineering Alice 90000
1 Bob Marketing Bob 75000

Merging on Multiple Columns

You can merge on multiple columns simultaneously by passing a list:

import pandas as pd

df1 = pd.DataFrame({
'First': ['Alice', 'Bob', 'Alice'],
'Last': ['Smith', 'Jones', 'Brown'],
'Age': [25, 30, 28]
})

df2 = pd.DataFrame({
'First': ['Alice', 'Bob', 'Alice'],
'Last': ['Smith', 'Jones', 'White'],
'Salary': [50000, 60000, 55000]
})

merged = pd.merge(df1, df2, on=['First', 'Last'])
print(merged)

Output:

   First   Last  Age  Salary
0 Alice Smith 25 50000
1 Bob Jones 30 60000

Only rows where both First and Last match are included.

When to Use concat() vs. merge()

ScenarioUse
Stack DataFrames with the same columns (add more rows)concat()
Place DataFrames side by side (add more columns)concat(axis=1)
Combine multiple DataFrames at onceconcat()
Join DataFrames on shared key columnsmerge()
Replicate SQL-style joins (inner, outer, left, right)merge()
Match rows based on values in one or more columnsmerge()

Comprehensive Comparison

Featureconcat()merge()
PurposeStack or concatenate along an axisCombine based on shared columns
DirectionVertical (axis=0) or horizontal (axis=1)Joins based on column values
Join typesN/AInner, outer, left, right
Multiple DataFrames✅ Accepts a list of many DataFrames❌ Two DataFrames at a time
Key column required❌ No✅ Yes
Best forAppending similar datasetsRelating datasets with shared keys

Conclusion

Pandas gives you two powerful tools for combining DataFrames:

  • concat() is your go-to for stacking DataFrames - appending rows from multiple sources with the same structure or placing columns side by side. It's simple, fast, and can handle any number of DataFrames at once.

  • merge() is your go-to for joining DataFrames based on shared column values - just like SQL joins. It gives you fine-grained control with inner, outer, left, and right join types.

  • Choose concat() when your DataFrames share the same structure and you want to stack them.

  • Choose merge() when your DataFrames share a key column and you need to combine related information.

Together, these two methods cover virtually every data combination scenario you'll encounter in pandas.