Skip to main content

Python Pandas: What Is the Difference Between pd.merge() and pd.concat() in Pandas

Pandas offers two primary ways to combine DataFrames: pd.merge() and pd.concat(). While both produce a single DataFrame from multiple inputs, they serve fundamentally different purposes. pd.merge() links data relationally using shared keys, much like a SQL JOIN. pd.concat() stacks data physically along an axis, like gluing spreadsheets together. Choosing the wrong one leads to incorrect results, duplicated rows, or unexpected NaN values.

This guide explains when and how to use each method, with clear examples and side-by-side comparisons.

The Core Difference at a Glance

Featurepd.merge()pd.concat()
Mental modelLinking rows by shared keysStacking rows or columns together
Operation typeRelational join (SQL-like)Physical append along an axis
Primary use caseCombining related but different tablesCombining structurally similar tables
AnalogyDatabase JOINPiling spreadsheets on top of each other

pd.merge(): Relational Joins on Shared Keys

Use pd.merge() when two DataFrames share a common column (or set of columns) and you want to look up or match rows based on those keys. This is conceptually identical to a SQL JOIN:

import pandas as pd

users = pd.DataFrame({
'user_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
'user_id': [2, 2, 3],
'product': ['Laptop', 'Mouse', 'Keyboard']
})

result = pd.merge(users, orders, on='user_id', how='left')
print(result)

Output:

   user_id     name   product
0 1 Alice NaN
1 2 Bob Laptop
2 2 Bob Mouse
3 3 Charlie Keyboard

Alice has no matching orders, so her product value is NaN. Bob appears twice because he has two orders. This is exactly how a SQL LEFT JOIN behaves.

Understanding Join Types

The how parameter controls which rows are kept when keys do not fully overlap between the two DataFrames:

import pandas as pd

users = pd.DataFrame({
'user_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
'user_id': [2, 2, 3],
'product': ['Laptop', 'Mouse', 'Keyboard']
})

# Inner: only rows where user_id exists in BOTH DataFrames
inner = pd.merge(users, orders, on='user_id', how='inner')
print("Inner join:")
print(inner)

# Left: all rows from users, matching rows from orders
left = pd.merge(users, orders, on='user_id', how='left')
print("\nLeft join:")
print(left)

# Right: all rows from orders, matching rows from users
right = pd.merge(users, orders, on='user_id', how='right')
print("\nRight join:")
print(right)

# Outer: all rows from both, NaN where no match
outer = pd.merge(users, orders, on='user_id', how='outer')
print("\nOuter join:")
print(outer)

Output:

Inner join:
user_id name product
0 2 Bob Laptop
1 2 Bob Mouse
2 3 Charlie Keyboard

Left join:
user_id name product
0 1 Alice NaN
1 2 Bob Laptop
2 2 Bob Mouse
3 3 Charlie Keyboard

Right join:
user_id name product
0 2 Bob Laptop
1 2 Bob Mouse
2 3 Charlie Keyboard

Outer join:
user_id name product
0 1 Alice NaN
1 2 Bob Laptop
2 2 Bob Mouse
3 3 Charlie Keyboard
tip

When in doubt, start with how='inner' to see only rows that have matches in both DataFrames. This makes it easy to verify your join key is correct before switching to left or outer.

pd.concat(): Stacking Data Along an Axis

Use pd.concat() when you have structurally similar DataFrames that you want to combine by appending rows or placing columns side by side. There is no key-based matching involved.

Vertical Stacking (Adding Rows)

This is the most common use case: combining multiple DataFrames that share the same columns, like monthly reports or daily CSV files:

import pandas as pd

jan = pd.DataFrame({'sales': [100, 150], 'region': ['East', 'West']})
feb = pd.DataFrame({'sales': [120, 180], 'region': ['East', 'West']})
mar = pd.DataFrame({'sales': [130, 160], 'region': ['East', 'West']})

yearly = pd.concat([jan, feb, mar], ignore_index=True)
print(yearly)

Output:

   sales region
0 100 East
1 150 West
2 120 East
3 180 West
4 130 East
5 160 West

The ignore_index=True parameter resets the index to a clean 0-based sequence. Without it, you would get duplicate index values (0 and 1 repeated three times), which can cause confusion later.

Horizontal Stacking (Adding Columns)

Setting axis=1 places DataFrames side by side, aligning them by their index:

import pandas as pd

names = pd.DataFrame({'name': ['Alice', 'Bob']})
scores = pd.DataFrame({'score': [85, 92]})

combined = pd.concat([names, scores], axis=1)
print(combined)

Output:

    name  score
0 Alice 85
1 Bob 92
warning

Horizontal concatenation with axis=1 aligns on the index, not on any column value. If the two DataFrames have different index values, you will get NaN in the mismatched positions. Make sure the indexes correspond to the same entities before using this approach.

Visual Comparison

Understanding the difference is easier when you visualize what each operation does:

pd.merge() links rows by key:

┌─────────┐ ┌─────────┐ ┌──────────────┐
│ Users │ │ Orders │ │ Users+Orders │
│ (id) │ ───► │ (id) │ ==> │ (matched) │
└─────────┘ └─────────┘ └──────────────┘


pd.concat() stacks data physically:

┌─────────┐
│ Jan │
├─────────┤ ┌─────────┐
│ Feb │ axis=0 ==> │ All │
├─────────┤ │ Months │
│ Mar │ └─────────┘
└─────────┘

Common Mistake: Using concat When You Need merge

A frequent error is using pd.concat() to combine two DataFrames that should be joined on a key. This results in duplicated or misaligned data instead of a proper relational lookup:

import pandas as pd

users = pd.DataFrame({
'user_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
'user_id': [2, 3],
'product': ['Laptop', 'Keyboard']
})

# Wrong: concat just stacks them, ignoring the relationship
wrong = pd.concat([users, orders], ignore_index=True)
print("Wrong (concat):")
print(wrong)

# Right: merge links them on the shared key
right = pd.merge(users, orders, on='user_id', how='left')
print("\nRight (merge):")
print(right)

Output:

Wrong (concat):
user_id name product
0 1 Alice NaN
1 2 Bob NaN
2 3 Charlie NaN
3 2 NaN Laptop
4 3 NaN Keyboard

Right (merge):
user_id name product
0 1 Alice NaN
1 2 Bob Laptop
2 3 Charlie Keyboard

The concat version creates five rows with NaN scattered everywhere because it simply stacked the two DataFrames without considering the user_id relationship. The merge version correctly pairs each user with their orders.

When to Use Each Method

ScenarioUse
Add customer names to an orders table via a shared IDpd.merge()
Combine daily CSV files with identical columns into onepd.concat()
Look up product prices from a catalog tablepd.merge()
Stack quarterly financial reportspd.concat()
Join employee records with department informationpd.merge()
Place independently computed columns side by sidepd.concat(axis=1)

The deciding question is simple: Are you matching rows by a shared key? If yes, use merge(). Are you appending similar data together? If yes, use concat().

Quick Reference

# merge: relational join on a shared key
pd.merge(left, right, on='key', how='left')

# merge: when key columns have different names
pd.merge(left, right, left_on='id', right_on='user_id', how='inner')

# concat: vertical stack (add rows)
pd.concat([df1, df2, df3], ignore_index=True)

# concat: horizontal stack (add columns)
pd.concat([df1, df2], axis=1)

Summary

  • Use pd.merge() when you need to link data from different tables based on shared keys, just like a SQL JOIN. It matches rows relationally and supports inner, left, right, and outer join types.
  • Use pd.concat() when you need to stack structurally similar DataFrames together, either vertically (adding rows with axis=0) or horizontally (adding columns with axis=1).

Think of merge as a database operation that connects related information, and concat as physical assembly that combines datasets sharing the same structure.