Skip to main content

Python Pandas: How to Remove Rows With Special Characters in a Pandas DataFrame

When working with real-world data, you often encounter values containing special characters like @, #, $, %, &, *, /, and others that indicate data quality issues - such as data entry errors, corrupted records, or improperly formatted fields. Cleaning these rows is an important data preprocessing step before analysis or modeling.

In this guide, you will learn how to detect and remove rows containing special characters in a Pandas DataFrame using regular expressions with the str.contains() method.

Understanding the Approach

The process involves three steps:

  1. Identify rows where one or more columns contain special characters using regex patterns.
  2. Select those rows by combining conditions across columns.
  3. Drop the selected rows from the DataFrame.

Two common regex patterns are used for detection:

PatternMeaningMatches
[^0-9a-zA-Z ]NOT alphanumeric or spaceAny special character
[@#&$%+\-*/]Specific special charactersOnly the listed characters

Example: Detecting and Removing Rows

Creating Sample Data

import pandas as pd

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana', 'Eve#', 'Frank'],
'Grade': ['A', 'B+', 'A', '$C', 'B', 'A-'],
'Score': ['95', '88', '92', '76', '8*4', '90']
})

print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
Name Grade Score
0 Alice A 95
1 Bob B+ 88
2 Ch@rlie A 92
3 Diana $C 76
4 Eve# B 8*4
5 Frank A- 90

Rows 1, 2, 3, 4, and 5 contain special characters in various columns.

Step 1: Detect Rows With Special Characters Per Column

Use str.contains() with a regex pattern to find rows where a specific column has special characters:

import pandas as pd

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana', 'Eve#', 'Frank'],
'Grade': ['A', 'B+', 'A', '$C', 'B', 'A-'],
'Score': ['95', '88', '92', '76', '8*4', '90']
})

# Find rows where Name contains special characters
name_special = df[df['Name'].str.contains(r'[^0-9a-zA-Z ]', na=False)]
print("Rows with special characters in Name:")
print(name_special)

Output:

Rows with special characters in Name:
Name Grade Score
2 Ch@rlie A 92
4 Eve# B 8*4
import pandas as pd

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana', 'Eve#', 'Frank'],
'Grade': ['A', 'B+', 'A', '$C', 'B', 'A-'],
'Score': ['95', '88', '92', '76', '8*4', '90']
})


# Find rows where Grade contains special characters
grade_special = df[df['Grade'].str.contains(r'[^0-9a-zA-Z ]', na=False)]
print("Rows with special characters in Grade:")
print(grade_special)

Output:

Rows with special characters in Grade:
Name Grade Score
1 Bob B+ 88
3 Diana $C 76
5 Frank A- 90
Always use na=False with str.contains()

If a column contains NaN values, str.contains() returns NaN for those rows instead of True/False, which causes errors when used as a boolean mask:

# ❌ NaN in the column causes issues
df['Name'].str.contains(r'[^0-9a-zA-Z]')

# ✅ na=False treats NaN rows as not matching
df['Name'].str.contains(r'[^0-9a-zA-Z]', na=False)

Step 2: Combine Conditions Across Multiple Columns

Use the | (OR) operator to find rows where any column contains special characters:

# Find rows where ANY column has special characters
mask = (
df['Name'].str.contains(r'[^0-9a-zA-Z ]', na=False) |
df['Grade'].str.contains(r'[^0-9a-zA-Z ]', na=False) |
df['Score'].str.contains(r'[^0-9a-zA-Z ]', na=False)
)

print("All rows with special characters:")
print(df[mask])

Output:

All rows with special characters:
Name Grade Score
1 Bob B+ 88
2 Ch@rlie A 92
3 Diana $C 76
4 Eve# B 8*4
5 Frank A- 90

Step 3: Drop the Rows

Use .drop() with the index of the matched rows, or simply invert the mask:

# Method 1: Using drop()
clean_df = df.drop(df[mask].index)

# Method 2: Using inverted mask (simpler)
clean_df = df[~mask]

print("Cleaned DataFrame:")
print(clean_df)

Output:

Cleaned DataFrame:
Name Grade Score
0 Alice A 95

Only row 0 (Alice, A, 95) has no special characters in any column.

Checking All Columns Dynamically

Instead of listing every column manually, loop through all (or selected) string columns:

import pandas as pd

df = pd.DataFrame({
'ID': ['101', '10#2', '103', '104'],
'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana'],
'Country': ['USA', 'UK', 'India', 'Fr&nce']
})

# Create a combined mask for all string columns
pattern = r'[^0-9a-zA-Z ]'
mask = pd.Series([False] * len(df))

for col in df.select_dtypes(include='object').columns:
mask = mask | df[col].str.contains(pattern, na=False)

print("Rows with special characters:")
print(df[mask])

print("\nCleaned DataFrame:")
print(df[~mask])

Output:

Rows with special characters:
ID Name Country
1 10#2 Bob UK
2 103 Ch@rlie India
3 104 Diana Fr&nce

Cleaned DataFrame:
ID Name Country
0 101 Alice USA

Using apply() for a Concise Approach

Apply the regex check across all columns at once using apply():

import pandas as pd

df = pd.DataFrame({
'ID': ['101', '10#2', '103', '104'],
'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana'],
'Country': ['USA', 'UK', 'India', 'Fr&nce']
})

# Check all columns at once
pattern = r'[^0-9a-zA-Z ]'
mask = df.apply(lambda col: col.str.contains(pattern, na=False)).any(axis=1)

clean_df = df[~mask]
print(clean_df)

Output:

    ID   Name Country
0 101 Alice USA

This approach is concise and scales automatically to any number of columns.

Customizing Which Characters to Allow

Adjust the regex pattern based on what you consider "valid":

# Allow only letters and numbers (strictest)
pattern = r'[^0-9a-zA-Z]'

# Allow letters, numbers, and spaces
pattern = r'[^0-9a-zA-Z ]'

# Allow letters, numbers, spaces, and periods
pattern = r'[^0-9a-zA-Z .]'

# Only flag specific special characters
pattern = r'[@#$%&*]'
Choosing the right regex pattern
PatternAllowsFlags
[^0-9a-zA-Z]Letters, digits onlySpaces, punctuation, everything else
[^0-9a-zA-Z ]Letters, digits, spacesPunctuation, symbols
[^0-9a-zA-Z .\-]Letters, digits, spaces, dots, hyphensMost special characters
[@#$%&*]Everything except listed charsOnly @, #, $, %, &, *

Choose the pattern that matches your data quality requirements. Being too strict may remove valid data (e.g., names with hyphens like "Mary-Jane").

Handling Numeric Columns

str.contains() only works on string columns. If you have numeric columns that might contain special characters (stored as strings), convert them first or check only object-type columns:

# Only check string (object) columns
string_cols = df.select_dtypes(include='object').columns
mask = df[string_cols].apply(lambda col: col.str.contains(r'[^0-9a-zA-Z ]', na=False)).any(axis=1)

clean_df = df[~mask].reset_index(drop=True)

Complete Reusable Function

import pandas as pd

def remove_rows_with_special_chars(df, columns=None, pattern=r'[^0-9a-zA-Z ]'):
"""
Remove rows containing special characters from specified columns.

Args:
df: Input DataFrame
columns: List of columns to check (default: all object columns)
pattern: Regex pattern defining special characters

Returns:
Cleaned DataFrame with special-character rows removed
"""
if columns is None:
columns = df.select_dtypes(include='object').columns.tolist()

mask = pd.Series([False] * len(df), index=df.index)
for col in columns:
mask = mask | df[col].str.contains(pattern, na=False)

removed = mask.sum()
print(f"Removed {removed} row(s) with special characters.")
return df[~mask].reset_index(drop=True)


# Usage
df = pd.DataFrame({
'Name': ['Alice', 'B@b', 'Charlie'],
'Grade': ['A', 'B', 'C#'],
'Score': [95, 88, 92]
})

clean_df = remove_rows_with_special_chars(df, columns=['Name', 'Grade'])
print(clean_df)

Output:

Removed 2 row(s) with special characters.
Name Grade Score
0 Alice A 95

Conclusion

Removing rows with special characters in Pandas involves using str.contains() with a regex pattern to identify problematic values, combining conditions across columns with the | operator, and filtering out the matching rows with an inverted mask (~mask).

For dynamic, scalable solutions, use apply() across all string columns or create a reusable function.

Always include na=False to handle NaN values safely, and choose your regex pattern carefully to avoid removing valid data that happens to contain legitimate punctuation or symbols.