Python Pandas: How to Read a CSV File with a Custom Delimiter in Pandas
CSV files don't always use commas as delimiters. In practice, you'll encounter data files separated by tabs, semicolons, pipes, underscores, or even multiple mixed delimiters. Pandas' read_csv() function handles all of these cases through its sep (separator) parameter, making it easy to load any delimited text file into a DataFrame.
In this guide, you'll learn how to read CSV files with various custom delimiters, use regular expressions for complex separators, and avoid common mistakes when specifying delimiters.
Default Behavior: Comma Delimiter
By default, pd.read_csv() uses a comma (,) as the separator, which works for standard CSV files:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Sample data.csv:
Name,Age,City
Alice,30,New York
Bob,25,Chicago
Charlie,35,Boston
Output:
Name Age City
0 Alice 30 New York
1 Bob 25 Chicago
2 Charlie 35 Boston
Using a Custom Single-Character Delimiter
To specify a different delimiter, pass it to the sep parameter.
Semicolon Delimiter (;)
Semicolons are commonly used in European CSV exports where commas serve as decimal separators:
import pandas as pd
df = pd.read_csv('data_semicolon.csv', sep=';')
print(df)
Sample data_semicolon.csv:
Name;Age;City
Alice;30;New York
Bob;25;Chicago
Output:
Name Age City
0 Alice 30 New York
1 Bob 25 Chicago
Pipe Delimiter (|)
import pandas as pd
df = pd.read_csv('data_pipe.csv', sep='|')
print(df)
Sample data_pipe.csv:
Name|Age|City
Alice|30|New York
Bob|25|Chicago
Output:
Name Age City
0 Alice 30 New York
1 Bob 25 Chicago
Tab Delimiter (\t)
Tab-separated files (TSV) are among the most common alternatives to CSV:
import pandas as pd
df = pd.read_csv('data.tsv', sep='\t')
print(df)
Sample data.csv:
Name Age City
Alice 30 New York
Bob 25 Chicago
Output:
Name Age City
0 Alice 30 New York
1 Bob 25 Chicago
Underscore Delimiter (_)
import pandas as pd
df = pd.read_csv('data_underscore.csv', sep='_', engine='python')
print(df)
Sample data_underscore.csv:
Name_Age_City
Alice_30_New York
Bob_25_Chicago
Output:
Name Age City
0 Alice 30 New York
1 Bob 25 Chicago
engine='python'?When using certain delimiters or multi-character separators, Pandas may issue a ParserWarning suggesting you specify the engine. Setting engine='python' uses Python's built-in parser, which supports:
- Multi-character delimiters (e.g.,
sep='::') - Regular expression delimiters (e.g.,
sep='\s+') - Certain special characters
The default engine='c' is faster but only supports single-character delimiters.
Using Regular Expressions as Delimiters
When your file uses multiple or inconsistent delimiters, you can pass a regular expression to the sep parameter to handle all of them at once.
Multiple Mixed Delimiters
Consider a messy file where columns are separated by commas, colons, pipes, or underscores:
Sample messy_data.csv:
Name,Age:City|Score
Alice,30:New York|85
Bob,25:Chicago|92
Charlie,35:Boston|78
Use a character class in a regular expression to match any of these delimiters:
import pandas as pd
df = pd.read_csv('messy_data.csv', sep='[,:|]', engine='python')
print(df)
Output:
Name Age City Score
0 Alice 30 New York 85
1 Bob 25 Chicago 92
2 Charlie 35 Boston 78
The regex [,:|] matches any single character that is a comma, colon, or pipe.
Whitespace Delimiter
For files where columns are separated by one or more spaces or tabs, use \s+:
import pandas as pd
df = pd.read_csv('data_spaces.txt', sep='\s+', engine='python')
print(df)
Sample data_spaces.txt:
Name Age City
Alice 30 NewYork
Bob 25 Chicago
Charlie 35 Boston
Output:
Name Age City
0 Alice 30 NewYork
1 Bob 25 Chicago
2 Charlie 35 Boston
Multi-Character Delimiter
Some files use multi-character separators like :: or ->:
import pandas as pd
df = pd.read_csv('data_double_colon.csv', sep='::', engine='python')
print(df)
Sample data_double_colon.csv:
Name::Age::City
Alice::30::New York
Bob::25::Chicago
Output:
Name Age City
0 Alice 30 New York
1 Bob 25 Chicago
Common Mistake: Forgetting to Set the Correct Delimiter
If you don't specify the correct separator, Pandas will try to split on commas by default, producing a DataFrame with all data crammed into a single column:
import pandas as pd
# ❌ Wrong: file uses semicolons, but default comma separator is used
df = pd.read_csv('data_semicolon.csv')
print(df)
Output:
Name;Age;City
0 Alice;30;New York
1 Bob;25;Chicago
Everything is treated as a single column because no commas were found. Always inspect your file to determine the delimiter before reading:
# ✅ Quick way to check the first few lines of a file
with open('data_semicolon.csv', 'r') as f:
for i, line in enumerate(f):
print(line.strip())
if i >= 2:
break
Output:
Name;Age;City
Alice;30;New York
Bob;25;Chicago
Now you can see the semicolons and set sep=';' accordingly.
Common Mistake: Parser Warning with Special Delimiters
When using certain separators without specifying the engine, Pandas may raise a warning:
import pandas as pd
# ⚠️ May produce a ParserWarning
df = pd.read_csv('data.csv', sep='::')
Warning:
ParserWarning: Falling back to the 'python' engine because the 'c' parser does not support
regex separators; you can avoid this warning by specifying engine='python'.
Fix: Always set engine='python' when using regex or multi-character separators:
# ✅ No warning
df = pd.read_csv('data.csv', sep='::', engine='python')
Useful read_csv() Parameters for Custom Delimiters
Beyond sep, several other parameters are helpful when working with non-standard files:
| Parameter | Description | Example |
|---|---|---|
sep | Delimiter string or regex | sep=';', sep='\t', sep='\s+' |
engine | Parser engine ('c' or 'python') | engine='python' for regex |
header | Row number(s) to use as column names | header=0 (default), header=None |
names | Custom column names | names=['A', 'B', 'C'] |
index_col | Column to use as the index | index_col=0 |
usecols | Subset of columns to read | usecols=['Name', 'Age'] |
skiprows | Number of rows to skip at the start | skiprows=2 |
na_values | Values to treat as NaN | na_values=['N/A', 'missing'] |
encoding | File encoding | encoding='utf-8', encoding='latin-1' |
Example: Combining Multiple Parameters
import pandas as pd
df = pd.read_csv(
'data_semicolon.csv',
sep=';',
header=0,
usecols=['Name', 'City'],
na_values=['N/A', ''],
encoding='utf-8'
)
print(df)
Output:
Name City
0 Alice New York
1 Bob Chicago
Summary
| Delimiter Type | sep Value | Engine Required |
|---|---|---|
| Comma (default) | ',' | 'c' (default) |
| Semicolon | ';' | 'c' (default) |
| Tab | '\t' | 'c' (default) |
| Pipe | `' | '` |
| Underscore | '_' | 'python' |
Multiple characters (::) | '::' | 'python' |
| Whitespace | '\s+' | 'python' |
| Regex pattern | `'[,; | :]'` |
To read a CSV file with a custom delimiter in Pandas, pass the delimiter to the sep parameter of pd.read_csv(). For single-character delimiters, the default C engine works fine.
For multi-character or regex-based separators, always set engine='python'.
When in doubt, inspect the first few lines of your file to identify the delimiter before reading it.