Python Pandas: How to Read Space-Delimited Files in Pandas
Not all data files use commas or tabs as separators. Many datasets - especially those generated by scientific instruments, log files, command-line tools, or legacy systems - use spaces to separate values. These space-delimited files can be tricky to parse because the number of spaces between columns may vary, making simple string splitting unreliable.
In this guide, you'll learn how to read space-delimited files into Pandas DataFrames using multiple methods, handle files with inconsistent spacing, and avoid common parsing issues.
What Is a Space-Delimited File?
A space-delimited file organizes data into rows and columns where spaces act as the separator between values. Each line represents one record:
Sample employees.txt:
Name Age City
Alice 25 NewYork
Bob 30 LosAngeles
Charlie 28 Chicago
Diana 35 Houston
Unlike CSV files (comma-separated) or TSV files (tab-separated), space-delimited files use one or more space characters to separate fields.
Method 1: Using pd.read_csv() with sep=' '
Despite its name, pd.read_csv() can handle any delimiter, not just commas. Set sep=' ' to specify a single space as the separator:
import pandas as pd
df = pd.read_csv('employees.txt', sep=' ')
print(df)
Output:
Name Age City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago
3 Diana 35 Houston
This works perfectly when columns are separated by exactly one space.
Method 2: Using pd.read_table() with sep=' '
The pd.read_table() function works identically to pd.read_csv() but defaults to tab separation. You can override it with sep=' ':
import pandas as pd
df = pd.read_table('employees.txt', sep=' ')
print(df)
Output:
Name Age City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago
3 Diana 35 Houston
Both read_csv() and read_table() produce identical results when given the same sep parameter.
Handling Multiple or Irregular Spaces
Real-world space-delimited files often have inconsistent spacing - some columns might be separated by one space, others by two, three, or more. This is common in files generated by print statements, fixed-width formatting, or command-line utilities.
Sample data_irregular.txt:
Name Age City
Alice 25 NewYork
Bob 30 LosAngeles
Charlie 28 Chicago
Using sep=' ' (single space) on this file produces incorrect results:
import pandas as pd
# ❌ Single space separator fails with irregular spacing
df = pd.read_csv('data_irregular.txt', sep=' ')
print(df)
Output:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 8 fields in line 3, saw 10
The Fix: Use sep='\s+' (Regex for One or More Whitespace Characters)
The regex pattern \s+ matches one or more whitespace characters (spaces, tabs, etc.), handling any amount of spacing between columns:
import pandas as pd
df = pd.read_csv('data_irregular.txt', sep='\s+', engine='python')
print(df)
Output:
Name Age City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago
sep='\s+' is the recommended approach for space-delimited files because it handles both single and multiple spaces. It's the most robust option and works regardless of how many spaces exist between columns.
Why engine='python' Is Needed
When using a regex pattern as the separator, Pandas may display a warning:
ParserWarning: Falling back to the 'python' engine because the 'c' parser does not support
regex separators; you can avoid this warning by specifying engine='python'.
Set engine='python' explicitly to suppress this warning. The Python engine is slightly slower than the C engine but fully supports regex separators.
Reading Files Without a Header Row
If your space-delimited file has no header row, use header=None and optionally assign column names with names:
Sample data_no_header.txt:
Alice 25 NewYork
Bob 30 LosAngeles
Charlie 28 Chicago
import pandas as pd
df = pd.read_csv(
'data_no_header.txt',
sep='\s+',
engine='python',
header=None,
names=['Name', 'Age', 'City']
)
print(df)
Output:
Name Age City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago
Handling Values That Contain Spaces
A significant challenge with space-delimited files is when data values themselves contain spaces (e.g., "New York" or "Los Angeles"). The parser can't distinguish between a delimiter space and a space within a value.
Sample cities_with_spaces.txt:
Name Age City
Alice 25 New York
Bob 30 Los Angeles
import pandas as pd
# ❌ This splits "New York" into two separate columns
df = pd.read_csv('cities_with_spaces.txt', sep='\s+', engine='python')
print(df)
Output:
Name Age City
Alice 25 New York
Bob 30 Los Angeles
Space-delimited files cannot reliably handle values containing spaces. If your data has multi-word values, consider using a different delimiter (comma, tab, pipe) or a fixed-width format.
Fix 1: Use Fixed-Width Format with read_fwf()
If your file uses fixed-width columns (each column occupies a specific number of characters), use pd.read_fwf():
import pandas as pd
df = pd.read_fwf('cities_with_spaces.txt')
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
read_fwf() infers column boundaries based on whitespace patterns and handles multi-word values correctly in many cases.
Fix 2: Specify Column Widths Explicitly
For more control, define the exact width of each column:
import pandas as pd
df = pd.read_fwf(
'cities_with_spaces.txt',
colspecs=[(0, 10), (10, 14), (14, 30)]
)
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
Skipping Comment Lines
Some space-delimited files include comment lines (starting with # or another character). Use the comment parameter to skip them:
Sample data_with_comments.txt:
import pandas as pd
df = pd.read_csv(
'data_with_comments.txt',
sep='\s+',
engine='python',
comment='#'
)
print(df)
Output:
Name Age City
0 Alice 25 NewYork
1 Bob 30 Chicago
Complete Example with Multiple Options
Here's a comprehensive example combining several useful parameters:
import pandas as pd
df = pd.read_csv(
'data.txt',
sep='\s+', # Handle any amount of whitespace
engine='python', # Required for regex separators
header=0, # First row is the header
comment='#', # Skip comment lines
na_values=['NA', '-'], # Treat these as missing values
dtype={'Age': int}, # Specify data types
skiprows=[], # Skip specific rows if needed
encoding='utf-8' # File encoding
)
print(df)
print(f"\nShape: {df.shape}")
print(f"Columns: {list(df.columns)}")
Summary
| Method | Syntax | Best For |
|---|---|---|
| Single space separator | sep=' ' | Files with exactly one space between columns |
| Regex whitespace | sep='\s+' | Files with variable/multiple spaces (recommended) |
read_fwf() | pd.read_fwf(file) | Fixed-width files or values containing spaces |
When reading space-delimited files in Pandas:
- Use
sep='\s+'as the default approach - it handles both single and multiple spaces reliably. - Set
engine='python'to avoid parser warnings when using regex separators. - Use
pd.read_fwf()when your data values contain spaces, as standard space-delimited parsing will fail. - Always inspect the first few lines of your file to understand the spacing pattern before choosing a parsing method.