Skip to main content

Python Pandas: How to Read Space-Delimited Files in Pandas

Not all data files use commas or tabs as separators. Many datasets - especially those generated by scientific instruments, log files, command-line tools, or legacy systems - use spaces to separate values. These space-delimited files can be tricky to parse because the number of spaces between columns may vary, making simple string splitting unreliable.

In this guide, you'll learn how to read space-delimited files into Pandas DataFrames using multiple methods, handle files with inconsistent spacing, and avoid common parsing issues.

What Is a Space-Delimited File?

A space-delimited file organizes data into rows and columns where spaces act as the separator between values. Each line represents one record:

Sample employees.txt:

Name Age City
Alice 25 NewYork
Bob 30 LosAngeles
Charlie 28 Chicago
Diana 35 Houston

Unlike CSV files (comma-separated) or TSV files (tab-separated), space-delimited files use one or more space characters to separate fields.

Method 1: Using pd.read_csv() with sep=' '

Despite its name, pd.read_csv() can handle any delimiter, not just commas. Set sep=' ' to specify a single space as the separator:

import pandas as pd

df = pd.read_csv('employees.txt', sep=' ')
print(df)

Output:

      Name  Age        City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago
3 Diana 35 Houston

This works perfectly when columns are separated by exactly one space.

Method 2: Using pd.read_table() with sep=' '

The pd.read_table() function works identically to pd.read_csv() but defaults to tab separation. You can override it with sep=' ':

import pandas as pd

df = pd.read_table('employees.txt', sep=' ')
print(df)

Output:

      Name  Age        City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago
3 Diana 35 Houston

Both read_csv() and read_table() produce identical results when given the same sep parameter.

Handling Multiple or Irregular Spaces

Real-world space-delimited files often have inconsistent spacing - some columns might be separated by one space, others by two, three, or more. This is common in files generated by print statements, fixed-width formatting, or command-line utilities.

Sample data_irregular.txt:

Name    Age   City
Alice 25 NewYork
Bob 30 LosAngeles
Charlie 28 Chicago

Using sep=' ' (single space) on this file produces incorrect results:

import pandas as pd

# ❌ Single space separator fails with irregular spacing
df = pd.read_csv('data_irregular.txt', sep=' ')
print(df)

Output:

pandas.errors.ParserError: Error tokenizing data. C error: Expected 8 fields in line 3, saw 10

The Fix: Use sep='\s+' (Regex for One or More Whitespace Characters)

The regex pattern \s+ matches one or more whitespace characters (spaces, tabs, etc.), handling any amount of spacing between columns:

import pandas as pd

df = pd.read_csv('data_irregular.txt', sep='\s+', engine='python')
print(df)

Output:

      Name  Age        City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago
tip

sep='\s+' is the recommended approach for space-delimited files because it handles both single and multiple spaces. It's the most robust option and works regardless of how many spaces exist between columns.

Why engine='python' Is Needed

When using a regex pattern as the separator, Pandas may display a warning:

ParserWarning: Falling back to the 'python' engine because the 'c' parser does not support
regex separators; you can avoid this warning by specifying engine='python'.

Set engine='python' explicitly to suppress this warning. The Python engine is slightly slower than the C engine but fully supports regex separators.

Reading Files Without a Header Row

If your space-delimited file has no header row, use header=None and optionally assign column names with names:

Sample data_no_header.txt:

Alice 25 NewYork
Bob 30 LosAngeles
Charlie 28 Chicago
import pandas as pd

df = pd.read_csv(
'data_no_header.txt',
sep='\s+',
engine='python',
header=None,
names=['Name', 'Age', 'City']
)
print(df)

Output:

      Name  Age        City
0 Alice 25 NewYork
1 Bob 30 LosAngeles
2 Charlie 28 Chicago

Handling Values That Contain Spaces

A significant challenge with space-delimited files is when data values themselves contain spaces (e.g., "New York" or "Los Angeles"). The parser can't distinguish between a delimiter space and a space within a value.

Sample cities_with_spaces.txt:

Name Age City
Alice 25 New York
Bob 30 Los Angeles
import pandas as pd

# ❌ This splits "New York" into two separate columns
df = pd.read_csv('cities_with_spaces.txt', sep='\s+', engine='python')
print(df)

Output:

       Name  Age     City
Alice 25 New York
Bob 30 Los Angeles
caution

Space-delimited files cannot reliably handle values containing spaces. If your data has multi-word values, consider using a different delimiter (comma, tab, pipe) or a fixed-width format.

Fix 1: Use Fixed-Width Format with read_fwf()

If your file uses fixed-width columns (each column occupies a specific number of characters), use pd.read_fwf():

import pandas as pd

df = pd.read_fwf('cities_with_spaces.txt')
print(df)

Output:

    Name  Age         City
0 Alice 25 New York
1 Bob 30 Los Angeles

read_fwf() infers column boundaries based on whitespace patterns and handles multi-word values correctly in many cases.

Fix 2: Specify Column Widths Explicitly

For more control, define the exact width of each column:

import pandas as pd

df = pd.read_fwf(
'cities_with_spaces.txt',
colspecs=[(0, 10), (10, 14), (14, 30)]
)
print(df)

Output:

    Name  Age         City
0 Alice 25 New York
1 Bob 30 Los Angeles

Skipping Comment Lines

Some space-delimited files include comment lines (starting with # or another character). Use the comment parameter to skip them:

Sample data_with_comments.txt:

# This is a comment
Name Age City
Alice 25 NewYork
# Another comment
Bob 30 Chicago
import pandas as pd

df = pd.read_csv(
'data_with_comments.txt',
sep='\s+',
engine='python',
comment='#'
)
print(df)

Output:

    Name  Age     City
0 Alice 25 NewYork
1 Bob 30 Chicago

Complete Example with Multiple Options

Here's a comprehensive example combining several useful parameters:

import pandas as pd

df = pd.read_csv(
'data.txt',
sep='\s+', # Handle any amount of whitespace
engine='python', # Required for regex separators
header=0, # First row is the header
comment='#', # Skip comment lines
na_values=['NA', '-'], # Treat these as missing values
dtype={'Age': int}, # Specify data types
skiprows=[], # Skip specific rows if needed
encoding='utf-8' # File encoding
)

print(df)
print(f"\nShape: {df.shape}")
print(f"Columns: {list(df.columns)}")

Summary

MethodSyntaxBest For
Single space separatorsep=' 'Files with exactly one space between columns
Regex whitespacesep='\s+'Files with variable/multiple spaces (recommended)
read_fwf()pd.read_fwf(file)Fixed-width files or values containing spaces

When reading space-delimited files in Pandas:

  • Use sep='\s+' as the default approach - it handles both single and multiple spaces reliably.
  • Set engine='python' to avoid parser warnings when using regex separators.
  • Use pd.read_fwf() when your data values contain spaces, as standard space-delimited parsing will fail.
  • Always inspect the first few lines of your file to understand the spacing pattern before choosing a parsing method.