Python Pandas: What Is the Difference Between pd.merge() and pd.concat() in Pandas
Pandas offers two primary ways to combine DataFrames: pd.merge() and pd.concat(). While both produce a single DataFrame from multiple inputs, they serve fundamentally different purposes. pd.merge() links data relationally using shared keys, much like a SQL JOIN. pd.concat() stacks data physically along an axis, like gluing spreadsheets together. Choosing the wrong one leads to incorrect results, duplicated rows, or unexpected NaN values.
This guide explains when and how to use each method, with clear examples and side-by-side comparisons.
The Core Difference at a Glance
| Feature | pd.merge() | pd.concat() |
|---|---|---|
| Mental model | Linking rows by shared keys | Stacking rows or columns together |
| Operation type | Relational join (SQL-like) | Physical append along an axis |
| Primary use case | Combining related but different tables | Combining structurally similar tables |
| Analogy | Database JOIN | Piling spreadsheets on top of each other |
pd.merge(): Relational Joins on Shared Keys
Use pd.merge() when two DataFrames share a common column (or set of columns) and you want to look up or match rows based on those keys. This is conceptually identical to a SQL JOIN:
import pandas as pd
users = pd.DataFrame({
'user_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
orders = pd.DataFrame({
'user_id': [2, 2, 3],
'product': ['Laptop', 'Mouse', 'Keyboard']
})
result = pd.merge(users, orders, on='user_id', how='left')
print(result)
Output:
user_id name product
0 1 Alice NaN
1 2 Bob Laptop
2 2 Bob Mouse
3 3 Charlie Keyboard
Alice has no matching orders, so her product value is NaN. Bob appears twice because he has two orders. This is exactly how a SQL LEFT JOIN behaves.
Understanding Join Types
The how parameter controls which rows are kept when keys do not fully overlap between the two DataFrames:
import pandas as pd
users = pd.DataFrame({
'user_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
orders = pd.DataFrame({
'user_id': [2, 2, 3],
'product': ['Laptop', 'Mouse', 'Keyboard']
})
# Inner: only rows where user_id exists in BOTH DataFrames
inner = pd.merge(users, orders, on='user_id', how='inner')
print("Inner join:")
print(inner)
# Left: all rows from users, matching rows from orders
left = pd.merge(users, orders, on='user_id', how='left')
print("\nLeft join:")
print(left)
# Right: all rows from orders, matching rows from users
right = pd.merge(users, orders, on='user_id', how='right')
print("\nRight join:")
print(right)
# Outer: all rows from both, NaN where no match
outer = pd.merge(users, orders, on='user_id', how='outer')
print("\nOuter join:")
print(outer)
Output:
Inner join:
user_id name product
0 2 Bob Laptop
1 2 Bob Mouse
2 3 Charlie Keyboard
Left join:
user_id name product
0 1 Alice NaN
1 2 Bob Laptop
2 2 Bob Mouse
3 3 Charlie Keyboard
Right join:
user_id name product
0 2 Bob Laptop
1 2 Bob Mouse
2 3 Charlie Keyboard
Outer join:
user_id name product
0 1 Alice NaN
1 2 Bob Laptop
2 2 Bob Mouse
3 3 Charlie Keyboard
When in doubt, start with how='inner' to see only rows that have matches in both DataFrames. This makes it easy to verify your join key is correct before switching to left or outer.
pd.concat(): Stacking Data Along an Axis
Use pd.concat() when you have structurally similar DataFrames that you want to combine by appending rows or placing columns side by side. There is no key-based matching involved.
Vertical Stacking (Adding Rows)
This is the most common use case: combining multiple DataFrames that share the same columns, like monthly reports or daily CSV files:
import pandas as pd
jan = pd.DataFrame({'sales': [100, 150], 'region': ['East', 'West']})
feb = pd.DataFrame({'sales': [120, 180], 'region': ['East', 'West']})
mar = pd.DataFrame({'sales': [130, 160], 'region': ['East', 'West']})
yearly = pd.concat([jan, feb, mar], ignore_index=True)
print(yearly)
Output:
sales region
0 100 East
1 150 West
2 120 East
3 180 West
4 130 East
5 160 West
The ignore_index=True parameter resets the index to a clean 0-based sequence. Without it, you would get duplicate index values (0 and 1 repeated three times), which can cause confusion later.
Horizontal Stacking (Adding Columns)
Setting axis=1 places DataFrames side by side, aligning them by their index:
import pandas as pd
names = pd.DataFrame({'name': ['Alice', 'Bob']})
scores = pd.DataFrame({'score': [85, 92]})
combined = pd.concat([names, scores], axis=1)
print(combined)
Output:
name score
0 Alice 85
1 Bob 92
Horizontal concatenation with axis=1 aligns on the index, not on any column value. If the two DataFrames have different index values, you will get NaN in the mismatched positions. Make sure the indexes correspond to the same entities before using this approach.
Visual Comparison
Understanding the difference is easier when you visualize what each operation does:
pd.merge() links rows by key:
┌─────────┐ ┌─────────┐ ┌──────────────┐
│ Users │ │ Orders │ │ Users+Orders │
│ (id) │ ───► │ (id) │ ==> │ (matched) │
└─────────┘ └─────────┘ └──────────────┘
pd.concat() stacks data physically:
┌─────────┐
│ Jan │
├─────────┤ ┌─────────┐
│ Feb │ axis=0 ==> │ All │
├─────────┤ │ Months │
│ Mar │ └─────────┘
└─────────┘
Common Mistake: Using concat When You Need merge
A frequent error is using pd.concat() to combine two DataFrames that should be joined on a key. This results in duplicated or misaligned data instead of a proper relational lookup:
import pandas as pd
users = pd.DataFrame({
'user_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
orders = pd.DataFrame({
'user_id': [2, 3],
'product': ['Laptop', 'Keyboard']
})
# Wrong: concat just stacks them, ignoring the relationship
wrong = pd.concat([users, orders], ignore_index=True)
print("Wrong (concat):")
print(wrong)
# Right: merge links them on the shared key
right = pd.merge(users, orders, on='user_id', how='left')
print("\nRight (merge):")
print(right)
Output:
Wrong (concat):
user_id name product
0 1 Alice NaN
1 2 Bob NaN
2 3 Charlie NaN
3 2 NaN Laptop
4 3 NaN Keyboard
Right (merge):
user_id name product
0 1 Alice NaN
1 2 Bob Laptop
2 3 Charlie Keyboard
The concat version creates five rows with NaN scattered everywhere because it simply stacked the two DataFrames without considering the user_id relationship. The merge version correctly pairs each user with their orders.
When to Use Each Method
| Scenario | Use |
|---|---|
| Add customer names to an orders table via a shared ID | pd.merge() |
| Combine daily CSV files with identical columns into one | pd.concat() |
| Look up product prices from a catalog table | pd.merge() |
| Stack quarterly financial reports | pd.concat() |
| Join employee records with department information | pd.merge() |
| Place independently computed columns side by side | pd.concat(axis=1) |
The deciding question is simple: Are you matching rows by a shared key? If yes, use merge(). Are you appending similar data together? If yes, use concat().
Quick Reference
# merge: relational join on a shared key
pd.merge(left, right, on='key', how='left')
# merge: when key columns have different names
pd.merge(left, right, left_on='id', right_on='user_id', how='inner')
# concat: vertical stack (add rows)
pd.concat([df1, df2, df3], ignore_index=True)
# concat: horizontal stack (add columns)
pd.concat([df1, df2], axis=1)
Summary
- Use
pd.merge()when you need to link data from different tables based on shared keys, just like a SQL JOIN. It matches rows relationally and supports inner, left, right, and outer join types. - Use
pd.concat()when you need to stack structurally similar DataFrames together, either vertically (adding rows withaxis=0) or horizontally (adding columns withaxis=1).
Think of merge as a database operation that connects related information, and concat as physical assembly that combines datasets sharing the same structure.