Python Pandas: How to Combine Two DataFrames in Pandas

When working with real-world data, information is often spread across multiple sources - one DataFrame might contain customer profiles while another holds their transaction history. To analyze this data together, you need to combine these DataFrames into a single, unified structure.

Pandas provides two primary methods for combining DataFrames: concat() for stacking data vertically or horizontally, and merge() for joining data based on shared columns (similar to SQL joins).

In this guide, you'll learn how to use both methods effectively, understand their differences, and know when to choose one over the other.

Combining DataFrames with `concat()`

The concat() function combines DataFrames by stacking them - either adding rows on top of each other (vertically) or placing columns side by side (horizontally).

Stacking DataFrames Vertically (Adding Rows)

Vertical concatenation is the most common use of concat(). It appends one DataFrame's rows below another's:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})

combined = pd.concat([df1, df2])
print(combined)

Output:

      Name  Age
  Alice   25
    Bob   30
Charlie   35
  David   40

Notice that the index values are duplicated (both DataFrames have indices 0 and 1). To reset the index and get a clean sequential order, use ignore_index=True:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})

combined = pd.concat([df1, df2], ignore_index=True)
print(combined)

Output:

      Name  Age
  Alice   25
    Bob   30
Charlie   35
  David   40

tip

Always use ignore_index=True when stacking DataFrames vertically unless you specifically need to preserve the original indices. Duplicate indices can cause confusion and unexpected behavior in later operations.

Stacking DataFrames Horizontally (Adding Columns)

To place DataFrames side by side, set axis=1:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'City': ['New York', 'Los Angeles'], 'Salary': [70000, 80000]})

combined = pd.concat([df1, df2], axis=1)
print(combined)

Output:

    Name  Age         City  Salary
0  Alice   25     New York   70000
1    Bob   30  Los Angeles   80000

Important: Horizontal Concatenation Aligns by Index

When concatenating horizontally, pandas aligns rows by their index values, not by position. If the indices don't match, you'll get NaN values:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob']}, index=[0, 1])
df2 = pd.DataFrame({'City': ['New York', 'Chicago']}, index=[1, 2])

combined = pd.concat([df1, df2], axis=1)
print(combined)

Output:

    Name      City
Alice       NaN
  Bob  New York
  NaN   Chicago

Make sure your indices align properly, or reset them with reset_index() before concatenating.

Handling Mismatched Columns

When vertically stacking DataFrames with different columns, concat() fills missing values with NaN:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie'], 'Salary': [50000]})

combined = pd.concat([df1, df2], ignore_index=True)
print(combined)

Output:

      Name   Age   Salary
  Alice  25.0      NaN
    Bob  30.0      NaN
Charlie   NaN  50000.0

Combining DataFrames with `merge()`

The merge() function combines DataFrames based on shared column values, similar to SQL JOIN operations. This is the right tool when you need to match rows from two DataFrames based on a common key.

Inner Join (Default)

An inner join keeps only the rows where the key column has matching values in both DataFrames:

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'David'],
    'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name')
print(merged)

Output:

    Name  Age  Salary
0  Alice   25   50000
1    Bob   30   60000

Only Alice and Bob appear because they exist in both DataFrames. Charlie (only in df1) and David (only in df2) are excluded.

Outer Join

An outer join keeps all rows from both DataFrames, filling NaN where there's no match:

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'David'],
    'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name', how='outer')
print(merged)

Output:

      Name   Age   Salary
  Alice  25.0  50000.0
    Bob  30.0  60000.0
Charlie  35.0      NaN
  David   NaN  70000.0

Left Join

A left join keeps all rows from the left DataFrame and adds matching data from the right:

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'David'],
    'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name', how='left')
print(merged)

Output:

      Name  Age   Salary
  Alice   25  50000.0
    Bob   30  60000.0
Charlie   35      NaN

All three names from df1 are preserved. Charlie has NaN for Salary because there's no match in df2.

Right Join

A right join keeps all rows from the right DataFrame and adds matching data from the left:

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'David'],
    'Salary': [50000, 60000, 70000]
})

merged = pd.merge(df1, df2, on='Name', how='right')
print(merged)

Output:

    Name   Age  Salary
Alice  25.0   50000
  Bob  30.0   60000
David   NaN   70000

Merging on Different Column Names

When the key columns have different names in each DataFrame, use left_on and right_on:

import pandas as pd

employees = pd.DataFrame({
    'emp_name': ['Alice', 'Bob', 'Charlie'],
    'department': ['Engineering', 'Marketing', 'Sales']
})

salaries = pd.DataFrame({
    'worker': ['Alice', 'Bob', 'David'],
    'salary': [90000, 75000, 65000]
})

merged = pd.merge(employees, salaries, left_on='emp_name', right_on='worker')
print(merged)

Output:

  emp_name   department worker  salary
0    Alice  Engineering  Alice   90000
1      Bob    Marketing    Bob   75000

Merging on Multiple Columns

You can merge on multiple columns simultaneously by passing a list:

import pandas as pd

df1 = pd.DataFrame({
    'First': ['Alice', 'Bob', 'Alice'],
    'Last': ['Smith', 'Jones', 'Brown'],
    'Age': [25, 30, 28]
})

df2 = pd.DataFrame({
    'First': ['Alice', 'Bob', 'Alice'],
    'Last': ['Smith', 'Jones', 'White'],
    'Salary': [50000, 60000, 55000]
})

merged = pd.merge(df1, df2, on=['First', 'Last'])
print(merged)

Output:

   First   Last  Age  Salary
0  Alice  Smith   25   50000
1    Bob  Jones   30   60000

Only rows where both First and Last match are included.

When to Use `concat()` vs. `merge()`

Scenario	Use
Stack DataFrames with the same columns (add more rows)	`concat()`
Place DataFrames side by side (add more columns)	`concat(axis=1)`
Combine multiple DataFrames at once	`concat()`
Join DataFrames on shared key columns	`merge()`
Replicate SQL-style joins (inner, outer, left, right)	`merge()`
Match rows based on values in one or more columns	`merge()`

Comprehensive Comparison

Feature	`concat()`	`merge()`
Purpose	Stack or concatenate along an axis	Combine based on shared columns
Direction	Vertical (`axis=0`) or horizontal (`axis=1`)	Joins based on column values
Join types	N/A	Inner, outer, left, right
Multiple DataFrames	✅ Accepts a list of many DataFrames	❌ Two DataFrames at a time
Key column required	❌ No	✅ Yes
Best for	Appending similar datasets	Relating datasets with shared keys

Conclusion

Pandas gives you two powerful tools for combining DataFrames:

concat() is your go-to for stacking DataFrames - appending rows from multiple sources with the same structure or placing columns side by side. It's simple, fast, and can handle any number of DataFrames at once.
merge() is your go-to for joining DataFrames based on shared column values - just like SQL joins. It gives you fine-grained control with inner, outer, left, and right join types.
Choose concat() when your DataFrames share the same structure and you want to stack them.
Choose merge() when your DataFrames share a key column and you need to combine related information.

Together, these two methods cover virtually every data combination scenario you'll encounter in pandas.

Combining DataFrames with concat()​

Stacking DataFrames Vertically (Adding Rows)​

Stacking DataFrames Horizontally (Adding Columns)​

Handling Mismatched Columns​

Combining DataFrames with merge()​

Inner Join (Default)​

Outer Join​

Left Join​

Right Join​

Merging on Different Column Names​

Merging on Multiple Columns​

When to Use concat() vs. merge()​

Comprehensive Comparison​

Conclusion​

Table of Contents

Combining DataFrames with `concat()`

Stacking DataFrames Vertically (Adding Rows)

Stacking DataFrames Horizontally (Adding Columns)

Handling Mismatched Columns

Combining DataFrames with `merge()`

Inner Join (Default)

Outer Join

Left Join

Right Join

Merging on Different Column Names

Merging on Multiple Columns

When to Use `concat()` vs. `merge()`

Comprehensive Comparison

Conclusion