Python Pandas: What Is the Difference Between pd.merge() and pd.concat() in Pandas

Pandas offers two primary ways to combine DataFrames: pd.merge() and pd.concat(). While both produce a single DataFrame from multiple inputs, they serve fundamentally different purposes. pd.merge() links data relationally using shared keys, much like a SQL JOIN. pd.concat() stacks data physically along an axis, like gluing spreadsheets together. Choosing the wrong one leads to incorrect results, duplicated rows, or unexpected NaN values.

This guide explains when and how to use each method, with clear examples and side-by-side comparisons.

The Core Difference at a Glance

Feature	`pd.merge()`	`pd.concat()`
Mental model	Linking rows by shared keys	Stacking rows or columns together
Operation type	Relational join (SQL-like)	Physical append along an axis
Primary use case	Combining related but different tables	Combining structurally similar tables
Analogy	Database JOIN	Piling spreadsheets on top of each other

pd.merge(): Relational Joins on Shared Keys

Use pd.merge() when two DataFrames share a common column (or set of columns) and you want to look up or match rows based on those keys. This is conceptually identical to a SQL JOIN:

import pandas as pd

users = pd.DataFrame({
    'user_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
    'user_id': [2, 2, 3],
    'product': ['Laptop', 'Mouse', 'Keyboard']
})

result = pd.merge(users, orders, on='user_id', how='left')
print(result)

Output:

   user_id     name   product
      1    Alice       NaN
      2      Bob    Laptop
      2      Bob     Mouse
      3  Charlie  Keyboard

Alice has no matching orders, so her product value is NaN. Bob appears twice because he has two orders. This is exactly how a SQL LEFT JOIN behaves.

Understanding Join Types

The how parameter controls which rows are kept when keys do not fully overlap between the two DataFrames:

import pandas as pd

users = pd.DataFrame({
    'user_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
    'user_id': [2, 2, 3],
    'product': ['Laptop', 'Mouse', 'Keyboard']
})

# Inner: only rows where user_id exists in BOTH DataFrames
inner = pd.merge(users, orders, on='user_id', how='inner')
print("Inner join:")
print(inner)

# Left: all rows from users, matching rows from orders
left = pd.merge(users, orders, on='user_id', how='left')
print("\nLeft join:")
print(left)

# Right: all rows from orders, matching rows from users
right = pd.merge(users, orders, on='user_id', how='right')
print("\nRight join:")
print(right)

# Outer: all rows from both, NaN where no match
outer = pd.merge(users, orders, on='user_id', how='outer')
print("\nOuter join:")
print(outer)

Output:

Inner join:
   user_id     name   product
0        2      Bob    Laptop
1        2      Bob     Mouse
2        3  Charlie  Keyboard

Left join:
   user_id     name   product
0        1    Alice       NaN
1        2      Bob    Laptop
2        2      Bob     Mouse
3        3  Charlie  Keyboard

Right join:
   user_id     name   product
0        2      Bob    Laptop
1        2      Bob     Mouse
2        3  Charlie  Keyboard

Outer join:
   user_id     name   product
0        1    Alice       NaN
1        2      Bob    Laptop
2        2      Bob     Mouse
3        3  Charlie  Keyboard

tip

When in doubt, start with how='inner' to see only rows that have matches in both DataFrames. This makes it easy to verify your join key is correct before switching to left or outer.

pd.concat(): Stacking Data Along an Axis

Use pd.concat() when you have structurally similar DataFrames that you want to combine by appending rows or placing columns side by side. There is no key-based matching involved.

Vertical Stacking (Adding Rows)

This is the most common use case: combining multiple DataFrames that share the same columns, like monthly reports or daily CSV files:

import pandas as pd

jan = pd.DataFrame({'sales': [100, 150], 'region': ['East', 'West']})
feb = pd.DataFrame({'sales': [120, 180], 'region': ['East', 'West']})
mar = pd.DataFrame({'sales': [130, 160], 'region': ['East', 'West']})

yearly = pd.concat([jan, feb, mar], ignore_index=True)
print(yearly)

Output:

   sales region
  100   East
  150   West
  120   East
  180   West
  130   East
  160   West

The ignore_index=True parameter resets the index to a clean 0-based sequence. Without it, you would get duplicate index values (0 and 1 repeated three times), which can cause confusion later.

Horizontal Stacking (Adding Columns)

Setting axis=1 places DataFrames side by side, aligning them by their index:

import pandas as pd

names = pd.DataFrame({'name': ['Alice', 'Bob']})
scores = pd.DataFrame({'score': [85, 92]})

combined = pd.concat([names, scores], axis=1)
print(combined)

Output:

    name  score
0  Alice     85
1    Bob     92

warning

Horizontal concatenation with axis=1 aligns on the index, not on any column value. If the two DataFrames have different index values, you will get NaN in the mismatched positions. Make sure the indexes correspond to the same entities before using this approach.

Visual Comparison

Understanding the difference is easier when you visualize what each operation does:

pd.merge() links rows by key:

┌─────────┐      ┌─────────┐       ┌──────────────┐
│ Users   │      │ Orders  │       │ Users+Orders │
│ (id)    │ ───► │ (id)    │  ==>  │ (matched)    │
└─────────┘      └─────────┘       └──────────────┘


pd.concat() stacks data physically:

┌─────────┐
│   Jan   │
├─────────┤                ┌─────────┐
│   Feb   │   axis=0 ==>  │   All   │
├─────────┤               │ Months  │
│   Mar   │               └─────────┘
└─────────┘

Common Mistake: Using concat When You Need merge

A frequent error is using pd.concat() to combine two DataFrames that should be joined on a key. This results in duplicated or misaligned data instead of a proper relational lookup:

import pandas as pd

users = pd.DataFrame({
    'user_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
    'user_id': [2, 3],
    'product': ['Laptop', 'Keyboard']
})

# Wrong: concat just stacks them, ignoring the relationship
wrong = pd.concat([users, orders], ignore_index=True)
print("Wrong (concat):")
print(wrong)

# Right: merge links them on the shared key
right = pd.merge(users, orders, on='user_id', how='left')
print("\nRight (merge):")
print(right)

Output:

Wrong (concat):
   user_id     name   product
0        1    Alice       NaN
1        2      Bob       NaN
2        3  Charlie       NaN
3        2      NaN    Laptop
4        3      NaN  Keyboard

Right (merge):
   user_id     name   product
0        1    Alice       NaN
1        2      Bob    Laptop
2        3  Charlie  Keyboard

The concat version creates five rows with NaN scattered everywhere because it simply stacked the two DataFrames without considering the user_id relationship. The merge version correctly pairs each user with their orders.

When to Use Each Method

Scenario	Use
Add customer names to an orders table via a shared ID	`pd.merge()`
Combine daily CSV files with identical columns into one	`pd.concat()`
Look up product prices from a catalog table	`pd.merge()`
Stack quarterly financial reports	`pd.concat()`
Join employee records with department information	`pd.merge()`
Place independently computed columns side by side	`pd.concat(axis=1)`

The deciding question is simple: Are you matching rows by a shared key? If yes, use merge(). Are you appending similar data together? If yes, use concat().

Quick Reference

# merge: relational join on a shared key
pd.merge(left, right, on='key', how='left')

# merge: when key columns have different names
pd.merge(left, right, left_on='id', right_on='user_id', how='inner')

# concat: vertical stack (add rows)
pd.concat([df1, df2, df3], ignore_index=True)

# concat: horizontal stack (add columns)
pd.concat([df1, df2], axis=1)

Summary

Use pd.merge() when you need to link data from different tables based on shared keys, just like a SQL JOIN. It matches rows relationally and supports inner, left, right, and outer join types.
Use pd.concat() when you need to stack structurally similar DataFrames together, either vertically (adding rows with axis=0) or horizontally (adding columns with axis=1).

Think of merge as a database operation that connects related information, and concat as physical assembly that combines datasets sharing the same structure.

The Core Difference at a Glance​

pd.merge(): Relational Joins on Shared Keys​

Understanding Join Types​

pd.concat(): Stacking Data Along an Axis​

Vertical Stacking (Adding Rows)​

Horizontal Stacking (Adding Columns)​

Visual Comparison​

Common Mistake: Using concat When You Need merge​

When to Use Each Method​

Quick Reference​

Summary​

Table of Contents

The Core Difference at a Glance

pd.merge(): Relational Joins on Shared Keys

Understanding Join Types

pd.concat(): Stacking Data Along an Axis

Vertical Stacking (Adding Rows)

Horizontal Stacking (Adding Columns)

Visual Comparison

Common Mistake: Using concat When You Need merge

When to Use Each Method

Quick Reference

Summary