Python Pandas: How to Merge Multiple CSV Files into a Single Pandas DataFrame in Python

When working with data analysis projects, it's common to receive data split across multiple CSV files - whether by date, region, department, or any other logical partition. Before you can analyze this data, you need to combine all these files into a single Pandas DataFrame. Manually copying and pasting data is error-prone and impractical for large datasets, so Python and Pandas offer efficient, automated solutions.

In this guide, you'll learn how to merge multiple CSV files using pd.concat(), automate file discovery with glob, and handle common issues like mismatched columns and duplicate headers.

CSV files for following examples

sales_jan.csv
OrderID,Product,Quantity,Price
1001,Widget,5,29.99
1002,Gadget,3,49.99
1003,Widget,7,29.99
1004,Gizmo,2,19.99

sales_feb.csv
OrderID,Product,Quantity,Price
1005,Widget,1,29.99
1006,Gadget,4,49.99
1007,Gizmo,3,19.99
1008,Widget,6,29.99

sales_mar.csv
OrderID,Product,Quantity,Price
1009,Widget,2,29.99
1010,Gadget,5,49.99
1011,Gizmo,1,19.99
1012,Widget,3,29.99

names.csv
ID,Name
1,Alice
2,Bob
3,Charlie
4,David

scores.csv
ID,Score
1,85
2,92
3,78
4,88

Method 1: Merging Specific CSV Files by Name

The simplest approach is to explicitly list the CSV files you want to merge and use pd.concat() combined with map() to read and concatenate them in one step.

How It Works

pd.read_csv() reads each CSV file into a DataFrame.
map() applies pd.read_csv() to every file path in the list.
pd.concat() stitches the resulting DataFrames together along the row axis.

import pandas as pd

# List the CSV files to merge
csv_files = ['sales_jan.csv', 'sales_feb.csv']

# Read and concatenate all files
df = pd.concat(map(pd.read_csv, csv_files), ignore_index=True)
print(df)

Output (example):

   OrderID Product  Quantity  Price
   1001  Widget         5  29.99
   1002  Gadget         3  49.99
   1003  Widget         7  29.99
   1004   Gizmo         2  19.99
   1005  Widget         1  29.99
   1006  Gadget         4  49.99
   1007   Gizmo         3  19.99
   1008  Widget         6  29.99

info

Setting ignore_index=True resets the index to a continuous sequence (0, 1, 2, ...). Without it, each file's original index is preserved, which can lead to duplicate index values.

Step-by-Step Alternative

If you prefer more explicit, readable code, you can read the files individually and then concatenate:

import pandas as pd

df1 = pd.read_csv('sales_jan.csv')
df2 = pd.read_csv('sales_feb.csv')
df3 = pd.read_csv('sales_mar.csv')

df = pd.concat([df1, df2, df3], ignore_index=True)
print(f"Total rows: {len(df)}")
print(df.head())

Output:

Total rows: 12
   OrderID Product  Quantity  Price
   1001  Widget         5  29.99
   1002  Gadget         3  49.99
   1003  Widget         7  29.99
   1004   Gizmo         2  19.99
   1005  Widget         1  29.99

This approach is fine for a small number of files, but becomes cumbersome when you have dozens or hundreds of CSVs.

Method 2: Automatically Discovering and Merging All CSV Files

When you have many CSV files in a directory (or don't know the exact filenames), use the glob and os modules to automatically find matching files and merge them.

How It Works

os.path.join() builds a file path pattern by combining the directory and a wildcard filename.
glob.glob() returns a list of all files matching the pattern.
pd.concat() with map() reads and merges them all.

import pandas as pd
import glob
import os

# Define the directory and file pattern
file_pattern = os.path.join("data", "sales_*.csv")

# Find all matching CSV files
csv_files = sorted(glob.glob(file_pattern))
print(f"Found {len(csv_files)} files: {csv_files}")

# Read and concatenate all files
df = pd.concat(map(pd.read_csv, csv_files), ignore_index=True)
print(df)

Output (example):

Found 3 files: ['sales_feb.csv', 'sales_jan.csv', 'sales_mar.csv']
    OrderID Product  Quantity  Price
    1005  Widget         1  29.99
    1006  Gadget         4  49.99
    1007   Gizmo         3  19.99
    1008  Widget         6  29.99
    1001  Widget         5  29.99
    1002  Gadget         3  49.99
    1003  Widget         7  29.99
    1004   Gizmo         2  19.99
    1009  Widget         2  29.99
    1010  Gadget         5  49.99
   1011   Gizmo         1  19.99
   1012  Widget         3  29.99

tip

Use sorted() on the glob.glob() result to ensure files are processed in a consistent, predictable order. Without sorting, the order depends on the operating system and filesystem, which can vary.

This entire operation can also be condensed into a single line:

df = pd.concat(map(pd.read_csv, sorted(glob.glob(os.path.join("data", "sales_*.csv")))), ignore_index=True)

Method 3: Adding a Source Column to Track File Origin

When merging many files, it's often useful to know which file each row came from. You can add a source column during the merge:

import pandas as pd
import glob
import os

csv_files = sorted(glob.glob(os.path.join("data", "sales_*.csv")))

dataframes = []
for filepath in csv_files:
    df_temp = pd.read_csv(filepath)
    df_temp['source_file'] = os.path.basename(filepath)
    dataframes.append(df_temp)

df = pd.concat(dataframes, ignore_index=True)
print(df)

Output (example):

    OrderID Product  Quantity  Price    source_file
    1005  Widget         1  29.99  sales_feb.csv
    1006  Gadget         4  49.99  sales_feb.csv
    1007   Gizmo         3  19.99  sales_feb.csv
    1008  Widget         6  29.99  sales_feb.csv
    1001  Widget         5  29.99  sales_jan.csv
    1002  Gadget         3  49.99  sales_jan.csv
    1003  Widget         7  29.99  sales_jan.csv
    1004   Gizmo         2  19.99  sales_jan.csv
    1009  Widget         2  29.99  sales_mar.csv
    1010  Gadget         5  49.99  sales_mar.csv
   1011   Gizmo         1  19.99  sales_mar.csv
   1012  Widget         3  29.99  sales_mar.csv

This makes it easy to filter or group by the original file later in your analysis.

Handling Common Issues

Mismatched Columns Across Files

If your CSV files have different columns, pd.concat() will still merge them but fill missing columns with NaN:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice'], 'Age': [25]})
df2 = pd.DataFrame({'Name': ['Bob'], 'Score': [88]})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

    Name   Age  Score
0  Alice  25.0    NaN
1    Bob   NaN   88.0

warning

This silent behavior can mask data quality issues. If all your CSV files should have the same columns, validate them before merging:

expected_columns = {'OrderID', 'Product', 'Quantity', 'Price'}
for filepath in csv_files:
    df_temp = pd.read_csv(filepath)
    if set(df_temp.columns) != expected_columns:
        print(f"Warning: {filepath} has unexpected columns: {set(df_temp.columns)}")

Duplicate Headers Inside CSV Files

Sometimes CSV files exported from tools contain header rows repeated within the data. After merging, these show up as data rows with column names as values:

import pandas as pd

# ❌ Header rows mixed into data after concat
df = pd.concat(map(pd.read_csv, ['file1.csv', 'file2.csv']), ignore_index=True)
print(df)

Output (problematic):

  OrderID  Product Quantity  Price
  1001   Widget        5  29.99
OrderID  Product Quantity  Price   ← duplicate header row
  1002   Gadget        3  49.99

Fix: Filter out rows that match the header values:

# ✅ Remove rows where values match column names
df = df[df['OrderID'] != 'OrderID']

# Reset data types after filtering
df['OrderID'] = df['OrderID'].astype(int)
df['Price'] = df['Price'].astype(float)
print(df)

Merging Large Files Efficiently

For very large CSV files, reading all of them into memory at once can be problematic. Use chunked reading or specify data types upfront to reduce memory usage:

import pandas as pd
import glob

csv_files = sorted(glob.glob("data/sales_*.csv"))

# Specify data types to reduce memory consumption
dtypes = {'OrderID': 'int32', 'Quantity': 'int16', 'Price': 'float32', 'Product': 'category'}

dataframes = [pd.read_csv(f, dtype=dtypes) for f in csv_files]
df = pd.concat(dataframes, ignore_index=True)

print(f"Total rows: {len(df)}")
print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1e6:.2f} MB")

Output:

Total rows: 12000
Memory usage: 5.00 MB

Row-wise vs Column-wise Concatenation

By default, pd.concat() stacks DataFrames vertically (row-wise, axis=0). If your CSV files contain different columns for the same records and you want to merge them horizontally, use axis=1:

import pandas as pd

df_names = pd.read_csv('names.csv')     # Contains: ID, Name
df_scores = pd.read_csv('scores.csv')   # Contains: ID, Score

# Horizontal concatenation (use merge for key-based joining)
result = pd.concat([df_names, df_scores], axis=1)
print(result)

Output:

   ID     Name  ID  Score
 1    Alice   1     85
 2      Bob   2     92
 3  Charlie   3     78
 4    David   4     88

tip

For horizontal merging based on a shared key column, use pd.merge() instead of pd.concat(axis=1). The concat function aligns by index position, not by logical keys, which can produce incorrect pairings if rows aren't in the same order.

Summary

Method	Best For	Key Function
Explicit file list with `map()`	Small number of known files	`pd.concat(map(pd.read_csv, files))`
`glob` pattern matching	Many files or dynamic discovery	`glob.glob(os.path.join(dir, "*.csv"))`
Loop with source tracking	Tracing data origin	Manual loop with `source_file` column

To merge multiple CSV files into a single Pandas DataFrame:

use pd.concat() to stack them vertically
use glob.glob() to automatically discover files
use ignore_index=True to maintain a clean index.

Always validate that your files share the same column structure, handle duplicate headers, and consider memory-efficient data types for large datasets.

CSV files for following examples​

Method 1: Merging Specific CSV Files by Name​

How It Works​

Step-by-Step Alternative​

Method 2: Automatically Discovering and Merging All CSV Files​

How It Works​

Method 3: Adding a Source Column to Track File Origin​

Handling Common Issues​

Mismatched Columns Across Files​

Duplicate Headers Inside CSV Files​

Merging Large Files Efficiently​

Row-wise vs Column-wise Concatenation​

Summary​

Table of Contents

CSV files for following examples

Method 1: Merging Specific CSV Files by Name

How It Works

Step-by-Step Alternative

Method 2: Automatically Discovering and Merging All CSV Files

How It Works

Method 3: Adding a Source Column to Track File Origin

Handling Common Issues

Mismatched Columns Across Files

Duplicate Headers Inside CSV Files

Merging Large Files Efficiently

Row-wise vs Column-wise Concatenation

Summary