Python Pandas: How to Split Strings into Lists or Columns Using Pandas str.split()

Splitting string data into separate components is a common task in data cleaning and preparation. Names need to be separated into first and last, addresses into street and city, or delimited values into individual fields. The Pandas str.split() method handles all of these scenarios efficiently, letting you split strings across an entire Series or DataFrame column in a single operation.

This guide covers how to use str.split() to produce either lists within a Series or separate DataFrame columns.

Understanding `str.split()` Syntax

Series.str.split(pat=None, n=-1, expand=False)

Parameter	Description	Default
`pat`	The delimiter string or regex pattern to split on	Whitespace
`n`	Maximum number of splits per string. `-1` means no limit	`-1` (all splits)
`expand`	If `True`, returns a DataFrame with each split in a separate column. If `False`, returns a Series of lists	`False`

info

Pandas' str.split() is different from Python's built-in str.split(). The Pandas version is accessed through the .str accessor and operates on an entire Series at once, handling NaN values automatically.

Sample DataFrame

The examples below use this DataFrame:

import pandas as pd

df = pd.DataFrame({
    'Name': ['John Smith', 'Alice Johnson', 'Bob Williams', 'Eve Davis'],
    'Team': ['Boston Celtics', 'Portland Trail Blazers', 'Detroit Pistons', 'Atlanta Hawks'],
    'Salary': [50000, 65000, 48000, 72000]
})
print(df)

Output:

            Name                    Team  Salary
   John Smith          Boston Celtics   50000
Alice Johnson  Portland Trail Blazers   65000
 Bob Williams         Detroit Pistons   48000
    Eve Davis           Atlanta Hawks   72000

Splitting Strings into a List (Series of Lists)

When expand=False (the default), str.split() returns a Series where each element is a list of the split components:

import pandas as pd

df = pd.DataFrame({
    'Name': ['John Smith', 'Alice Johnson', 'Bob Williams', 'Eve Davis']
})

# Split each name into a list of parts
name_lists = df['Name'].str.split(' ')
print(name_lists)
print("\nType of each element:", type(name_lists[0]))

Output:

0       [John, Smith]
1    [Alice, Johnson]
2     [Bob, Williams]
3        [Eve, Davis]
Name: Name, dtype: object

Type of each element: <class 'list'>

Each cell now contains a Python list. This is useful when you want to keep all parts together in a single column.

Limiting the Number of Splits

Use the n parameter to control how many splits occur. This is essential when strings have varying numbers of delimiters:

import pandas as pd

df = pd.DataFrame({
    'Team': ['Portland Trail Blazers', 'Boston Celtics', 'Golden State Warriors']
})

# Split at most once, produces exactly 2 parts
split_result = df['Team'].str.split(' ', n=1)
print(split_result)

Output:

  [Portland, Trail Blazers]
          [Boston, Celtics]
   [Golden, State Warriors]
Name: Team, dtype: object

With n=1, only the first space is used as a split point. Everything after it stays as a single string.

Splitting Strings into Separate Columns

Setting expand=True returns a DataFrame with each split part in its own column. This is the most common approach for creating new structured columns from a single string column.

Splitting Names into First and Last Name

import pandas as pd

df = pd.DataFrame({
    'Name': ['John Smith', 'Alice Johnson', 'Bob Williams', 'Eve Davis'],
    'Salary': [50000, 65000, 48000, 72000]
})

# Split into two columns
name_parts = df['Name'].str.split(' ', n=1, expand=True)
print("Split result:")
print(name_parts)

# Assign to new columns
df['First_Name'] = name_parts[0]
df['Last_Name'] = name_parts[1]

# Drop the original column
df = df.drop(columns=['Name'])
print("\nFinal DataFrame:")
print(df)

Output:

Split result:
       0         1
0   John     Smith
1  Alice   Johnson
2    Bob  Williams
3    Eve     Davis

Final DataFrame:
   Salary First_Name Last_Name
0   50000       John     Smith
1   65000      Alice   Johnson
2   48000        Bob  Williams
3   72000        Eve     Davis

The split produces columns numbered 0 and 1, which are then assigned to descriptively named columns.

Splitting with a Custom Delimiter

You can split on any character or string, not just spaces:

import pandas as pd

df = pd.DataFrame({
    'Date': ['2024-01-15', '2024-06-20', '2024-12-31']
})

# Split dates on the hyphen
date_parts = df['Date'].str.split('-', expand=True)
date_parts.columns = ['Year', 'Month', 'Day']

print(date_parts)

Output:

   Year Month Day
2024    01  15
2024    06  20
2024    12  31

Using `apply()` with `str.split()` for Custom Logic

For more complex splitting scenarios, combine apply() with a custom function:

import pandas as pd

df = pd.DataFrame({
    'Name': ['John Smith', 'Alice Marie Johnson', 'Bob Williams']
})


def split_name(name):
    """Split into first name and everything else as last name."""
    parts = name.split(' ', 1)
    if len(parts) == 2:
        return pd.Series({'First': parts[0], 'Last': parts[1]})
    return pd.Series({'First': parts[0], 'Last': ''})


result = df['Name'].apply(split_name)
print(result)

Output:

   First           Last
 John          Smith
Alice  Marie Johnson
  Bob       Williams

This approach gives you full control over how splits are handled, including edge cases like names with middle names.

Common Mistake: Uneven Splits Without `n` Parameter

When strings have different numbers of delimiters and you use expand=True without setting n, the resulting DataFrame may have an inconsistent number of columns - with None filling shorter rows:

import pandas as pd

df = pd.DataFrame({
    'Team': ['Portland Trail Blazers', 'Boston Celtics', 'Atlanta Hawks']
})

# PROBLEMATIC: no limit on splits, different rows produce different numbers of parts
result = df['Team'].str.split(' ', expand=True)
print(result)

Output:

          0        1        2
Portland    Trail  Blazers
  Boston  Celtics     None
 Atlanta    Hawks     None

Column 2 has None for rows that only split into two parts. This can cause issues in downstream processing.

The correct approach:

Use n to ensure a consistent number of columns:

import pandas as pd

df = pd.DataFrame({
    'Team': ['Portland Trail Blazers', 'Boston Celtics', 'Atlanta Hawks']
})

# CORRECT: limit to 1 split, always produces exactly 2 columns
result = df['Team'].str.split(' ', n=1, expand=True)
result.columns = ['City', 'Mascot']
print(result)

Output:

       City         Mascot
Portland  Trail Blazers
  Boston        Celtics
 Atlanta          Hawks

warning

When splitting strings that have a variable number of delimiters, always set the n parameter to control the maximum number of splits. Without it, expand=True creates columns that may contain None values, leading to unexpected behavior in subsequent operations.

Handling NaN Values

str.split() handles NaN values gracefully - they remain as NaN in the output:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['John Smith', np.nan, 'Bob Williams', None]
})

result = df['Name'].str.split(' ', expand=True)
print(result)

Output:

       1
John     Smith
 NaN       NaN
 Bob  Williams
None      None

tip

You don't need to filter out NaN values before splitting. The .str accessor automatically propagates NaN through string operations, keeping your data aligned.

Splitting and Accessing Specific Parts

If you only need one part of the split (e.g., just the first name), you can use .str[index] on the result:

import pandas as pd

df = pd.DataFrame({
    'Email': ['john@gmail.com', 'alice@yahoo.com', 'bob@outlook.com']
})

# Extract just the username (part before @)
df['Username'] = df['Email'].str.split('@').str[0]

# Extract just the domain
df['Domain'] = df['Email'].str.split('@').str[1]

print(df)

Output:

             Email Username       Domain
 john@gmail.com     john    gmail.com
alice@yahoo.com    alice    yahoo.com
bob@outlook.com      bob  outlook.com

This avoids creating intermediate DataFrames when you only need specific parts.

Quick Reference

Goal	Code	Returns
Split into lists	`df['col'].str.split(' ')`	Series of lists
Split into columns	`df['col'].str.split(' ', expand=True)`	DataFrame
Limit splits	`df['col'].str.split(' ', n=1, expand=True)`	DataFrame with `n+1` columns
Get first part only	`df['col'].str.split(' ').str[0]`	Series
Split on custom delimiter	`df['col'].str.split('-', expand=True)`	DataFrame
Custom split logic	`df['col'].apply(custom_function)`	Series or DataFrame

The str.split() method is a versatile tool for breaking apart string data in Pandas.

Use expand=False when you want to keep split parts as lists within cells, and expand=True when you need clean, separate columns.

Always set the n parameter when your data has an inconsistent number of delimiters to ensure predictable results.

Understanding str.split() Syntax​

Sample DataFrame​

Splitting Strings into a List (Series of Lists)​

Limiting the Number of Splits​

Splitting Strings into Separate Columns​

Splitting Names into First and Last Name​

Splitting with a Custom Delimiter​

Using apply() with str.split() for Custom Logic​

Common Mistake: Uneven Splits Without n Parameter​

The correct approach:​

Handling NaN Values​

Splitting and Accessing Specific Parts​

Quick Reference​

Table of Contents

Understanding `str.split()` Syntax

Sample DataFrame

Splitting Strings into a List (Series of Lists)

Limiting the Number of Splits

Splitting Strings into Separate Columns

Splitting Names into First and Last Name

Splitting with a Custom Delimiter

Using `apply()` with `str.split()` for Custom Logic

Common Mistake: Uneven Splits Without `n` Parameter

The correct approach:

Handling NaN Values

Splitting and Accessing Specific Parts

Quick Reference