Skip to main content

Python Pandas: How to Read a CSV File with a Custom Delimiter in Pandas

CSV files don't always use commas as delimiters. In practice, you'll encounter data files separated by tabs, semicolons, pipes, underscores, or even multiple mixed delimiters. Pandas' read_csv() function handles all of these cases through its sep (separator) parameter, making it easy to load any delimited text file into a DataFrame.

In this guide, you'll learn how to read CSV files with various custom delimiters, use regular expressions for complex separators, and avoid common mistakes when specifying delimiters.

Default Behavior: Comma Delimiter

By default, pd.read_csv() uses a comma (,) as the separator, which works for standard CSV files:

import pandas as pd

df = pd.read_csv('data.csv')
print(df)

Sample data.csv:

data.csv
Name,Age,City
Alice,30,New York
Bob,25,Chicago
Charlie,35,Boston

Output:

      Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago
2 Charlie 35 Boston

Using a Custom Single-Character Delimiter

To specify a different delimiter, pass it to the sep parameter.

Semicolon Delimiter (;)

Semicolons are commonly used in European CSV exports where commas serve as decimal separators:

import pandas as pd

df = pd.read_csv('data_semicolon.csv', sep=';')
print(df)

Sample data_semicolon.csv:

data_semicolon.csv
Name;Age;City
Alice;30;New York
Bob;25;Chicago

Output:

    Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago

Pipe Delimiter (|)

import pandas as pd

df = pd.read_csv('data_pipe.csv', sep='|')
print(df)

Sample data_pipe.csv:

Name|Age|City
Alice|30|New York
Bob|25|Chicago

Output:

    Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago

Tab Delimiter (\t)

Tab-separated files (TSV) are among the most common alternatives to CSV:

import pandas as pd

df = pd.read_csv('data.tsv', sep='\t')
print(df)

Sample data.csv:

Name	Age	City
Alice 30 New York
Bob 25 Chicago

Output:

    Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago

Underscore Delimiter (_)

import pandas as pd

df = pd.read_csv('data_underscore.csv', sep='_', engine='python')
print(df)

Sample data_underscore.csv:

Name_Age_City
Alice_30_New York
Bob_25_Chicago

Output:

    Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago
Why engine='python'?

When using certain delimiters or multi-character separators, Pandas may issue a ParserWarning suggesting you specify the engine. Setting engine='python' uses Python's built-in parser, which supports:

  • Multi-character delimiters (e.g., sep='::')
  • Regular expression delimiters (e.g., sep='\s+')
  • Certain special characters

The default engine='c' is faster but only supports single-character delimiters.

Using Regular Expressions as Delimiters

When your file uses multiple or inconsistent delimiters, you can pass a regular expression to the sep parameter to handle all of them at once.

Multiple Mixed Delimiters

Consider a messy file where columns are separated by commas, colons, pipes, or underscores:

Sample messy_data.csv:

Name,Age:City|Score
Alice,30:New York|85
Bob,25:Chicago|92
Charlie,35:Boston|78

Use a character class in a regular expression to match any of these delimiters:

import pandas as pd

df = pd.read_csv('messy_data.csv', sep='[,:|]', engine='python')
print(df)

Output:

      Name  Age      City  Score
0 Alice 30 New York 85
1 Bob 25 Chicago 92
2 Charlie 35 Boston 78

The regex [,:|] matches any single character that is a comma, colon, or pipe.

Whitespace Delimiter

For files where columns are separated by one or more spaces or tabs, use \s+:

import pandas as pd

df = pd.read_csv('data_spaces.txt', sep='\s+', engine='python')
print(df)

Sample data_spaces.txt:

Name      Age    City
Alice 30 NewYork
Bob 25 Chicago
Charlie 35 Boston

Output:

      Name  Age     City
0 Alice 30 NewYork
1 Bob 25 Chicago
2 Charlie 35 Boston

Multi-Character Delimiter

Some files use multi-character separators like :: or ->:

import pandas as pd

df = pd.read_csv('data_double_colon.csv', sep='::', engine='python')
print(df)

Sample data_double_colon.csv:

Name::Age::City
Alice::30::New York
Bob::25::Chicago

Output:

    Name  Age      City
0 Alice 30 New York
1 Bob 25 Chicago

Common Mistake: Forgetting to Set the Correct Delimiter

If you don't specify the correct separator, Pandas will try to split on commas by default, producing a DataFrame with all data crammed into a single column:

import pandas as pd

# ❌ Wrong: file uses semicolons, but default comma separator is used
df = pd.read_csv('data_semicolon.csv')
print(df)

Output:

    Name;Age;City
0 Alice;30;New York
1 Bob;25;Chicago

Everything is treated as a single column because no commas were found. Always inspect your file to determine the delimiter before reading:

# ✅ Quick way to check the first few lines of a file
with open('data_semicolon.csv', 'r') as f:
for i, line in enumerate(f):
print(line.strip())
if i >= 2:
break

Output:

Name;Age;City
Alice;30;New York
Bob;25;Chicago

Now you can see the semicolons and set sep=';' accordingly.

Common Mistake: Parser Warning with Special Delimiters

When using certain separators without specifying the engine, Pandas may raise a warning:

import pandas as pd

# ⚠️ May produce a ParserWarning
df = pd.read_csv('data.csv', sep='::')

Warning:

ParserWarning: Falling back to the 'python' engine because the 'c' parser does not support
regex separators; you can avoid this warning by specifying engine='python'.

Fix: Always set engine='python' when using regex or multi-character separators:

# ✅ No warning
df = pd.read_csv('data.csv', sep='::', engine='python')

Useful read_csv() Parameters for Custom Delimiters

Beyond sep, several other parameters are helpful when working with non-standard files:

ParameterDescriptionExample
sepDelimiter string or regexsep=';', sep='\t', sep='\s+'
engineParser engine ('c' or 'python')engine='python' for regex
headerRow number(s) to use as column namesheader=0 (default), header=None
namesCustom column namesnames=['A', 'B', 'C']
index_colColumn to use as the indexindex_col=0
usecolsSubset of columns to readusecols=['Name', 'Age']
skiprowsNumber of rows to skip at the startskiprows=2
na_valuesValues to treat as NaNna_values=['N/A', 'missing']
encodingFile encodingencoding='utf-8', encoding='latin-1'

Example: Combining Multiple Parameters

import pandas as pd

df = pd.read_csv(
'data_semicolon.csv',
sep=';',
header=0,
usecols=['Name', 'City'],
na_values=['N/A', ''],
encoding='utf-8'
)

print(df)

Output:

    Name      City
0 Alice New York
1 Bob Chicago

Summary

Delimiter Typesep ValueEngine Required
Comma (default)',''c' (default)
Semicolon';''c' (default)
Tab'\t''c' (default)
Pipe`''`
Underscore'_''python'
Multiple characters (::)'::''python'
Whitespace'\s+''python'
Regex pattern`'[,;:]'`

To read a CSV file with a custom delimiter in Pandas, pass the delimiter to the sep parameter of pd.read_csv(). For single-character delimiters, the default C engine works fine.

For multi-character or regex-based separators, always set engine='python'.

When in doubt, inspect the first few lines of your file to identify the delimiter before reading it.