Python Pandas: How to Remove Rows With Special Characters in a Pandas DataFrame

When working with real-world data, you often encounter values containing special characters like @, #, $, %, &, *, /, and others that indicate data quality issues - such as data entry errors, corrupted records, or improperly formatted fields. Cleaning these rows is an important data preprocessing step before analysis or modeling.

In this guide, you will learn how to detect and remove rows containing special characters in a Pandas DataFrame using regular expressions with the str.contains() method.

Understanding the Approach

The process involves three steps:

Identify rows where one or more columns contain special characters using regex patterns.
Select those rows by combining conditions across columns.
Drop the selected rows from the DataFrame.

Two common regex patterns are used for detection:

Pattern	Meaning	Matches
`[^0-9a-zA-Z ]`	NOT alphanumeric or space	Any special character
`[@#&$%+\-*/]`	Specific special characters	Only the listed characters

Example: Detecting and Removing Rows

Creating Sample Data

import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana', 'Eve#', 'Frank'],
    'Grade': ['A', 'B+', 'A', '$C', 'B', 'A-'],
    'Score': ['95', '88', '92', '76', '8*4', '90']
})

print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
      Name Grade Score
  Alice     A    95
    Bob    B+    88
Ch@rlie     A    92
  Diana    $C    76
   Eve#     B   8*4
  Frank    A-    90

Rows 1, 2, 3, 4, and 5 contain special characters in various columns.

Step 1: Detect Rows With Special Characters Per Column

Use str.contains() with a regex pattern to find rows where a specific column has special characters:

import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana', 'Eve#', 'Frank'],
    'Grade': ['A', 'B+', 'A', '$C', 'B', 'A-'],
    'Score': ['95', '88', '92', '76', '8*4', '90']
})

# Find rows where Name contains special characters
name_special = df[df['Name'].str.contains(r'[^0-9a-zA-Z ]', na=False)]
print("Rows with special characters in Name:")
print(name_special)

Output:

Rows with special characters in Name:
      Name Grade Score
2  Ch@rlie     A    92
4     Eve#     B   8*4

import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana', 'Eve#', 'Frank'],
    'Grade': ['A', 'B+', 'A', '$C', 'B', 'A-'],
    'Score': ['95', '88', '92', '76', '8*4', '90']
})


# Find rows where Grade contains special characters
grade_special = df[df['Grade'].str.contains(r'[^0-9a-zA-Z ]', na=False)]
print("Rows with special characters in Grade:")
print(grade_special)

Output:

Rows with special characters in Grade:
    Name Grade Score
1    Bob    B+    88
3  Diana    $C    76
5  Frank    A-    90

Always use na=False with str.contains()

If a column contains NaN values, str.contains() returns NaN for those rows instead of True/False, which causes errors when used as a boolean mask:

# ❌ NaN in the column causes issues
df['Name'].str.contains(r'[^0-9a-zA-Z]')

# ✅ na=False treats NaN rows as not matching
df['Name'].str.contains(r'[^0-9a-zA-Z]', na=False)

Step 2: Combine Conditions Across Multiple Columns

Use the | (OR) operator to find rows where any column contains special characters:

# Find rows where ANY column has special characters
mask = (
    df['Name'].str.contains(r'[^0-9a-zA-Z ]', na=False) |
    df['Grade'].str.contains(r'[^0-9a-zA-Z ]', na=False) |
    df['Score'].str.contains(r'[^0-9a-zA-Z ]', na=False)
)

print("All rows with special characters:")
print(df[mask])

Output:

All rows with special characters:
      Name Grade Score
    Bob    B+    88
Ch@rlie     A    92
  Diana    $C    76
   Eve#     B   8*4
  Frank    A-    90

Step 3: Drop the Rows

Use .drop() with the index of the matched rows, or simply invert the mask:

# Method 1: Using drop()
clean_df = df.drop(df[mask].index)

# Method 2: Using inverted mask (simpler)
clean_df = df[~mask]

print("Cleaned DataFrame:")
print(clean_df)

Output:

Cleaned DataFrame:
    Name Grade Score
0  Alice     A    95

Only row 0 (Alice, A, 95) has no special characters in any column.

Checking All Columns Dynamically

Instead of listing every column manually, loop through all (or selected) string columns:

import pandas as pd

df = pd.DataFrame({
    'ID': ['101', '10#2', '103', '104'],
    'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana'],
    'Country': ['USA', 'UK', 'India', 'Fr&nce']
})

# Create a combined mask for all string columns
pattern = r'[^0-9a-zA-Z ]'
mask = pd.Series([False] * len(df))

for col in df.select_dtypes(include='object').columns:
    mask = mask | df[col].str.contains(pattern, na=False)

print("Rows with special characters:")
print(df[mask])

print("\nCleaned DataFrame:")
print(df[~mask])

Output:

Rows with special characters:
     ID     Name Country
1  10#2      Bob      UK
2   103  Ch@rlie   India
3   104    Diana  Fr&nce

Cleaned DataFrame:
    ID   Name Country
0  101  Alice     USA

Using `apply()` for a Concise Approach

Apply the regex check across all columns at once using apply():

import pandas as pd

df = pd.DataFrame({
    'ID': ['101', '10#2', '103', '104'],
    'Name': ['Alice', 'Bob', 'Ch@rlie', 'Diana'],
    'Country': ['USA', 'UK', 'India', 'Fr&nce']
})

# Check all columns at once
pattern = r'[^0-9a-zA-Z ]'
mask = df.apply(lambda col: col.str.contains(pattern, na=False)).any(axis=1)

clean_df = df[~mask]
print(clean_df)

Output:

    ID   Name Country
0  101  Alice     USA

This approach is concise and scales automatically to any number of columns.

Customizing Which Characters to Allow

Adjust the regex pattern based on what you consider "valid":

# Allow only letters and numbers (strictest)
pattern = r'[^0-9a-zA-Z]'

# Allow letters, numbers, and spaces
pattern = r'[^0-9a-zA-Z ]'

# Allow letters, numbers, spaces, and periods
pattern = r'[^0-9a-zA-Z .]'

# Only flag specific special characters
pattern = r'[@#$%&*]'

Choosing the right regex pattern

Pattern	Allows	Flags
`[^0-9a-zA-Z]`	Letters, digits only	Spaces, punctuation, everything else
`[^0-9a-zA-Z ]`	Letters, digits, spaces	Punctuation, symbols
`[^0-9a-zA-Z .\-]`	Letters, digits, spaces, dots, hyphens	Most special characters
`[@#$%&*]`	Everything except listed chars	Only `@`, `#`, `$`, `%`, `&`, `*`

Choose the pattern that matches your data quality requirements. Being too strict may remove valid data (e.g., names with hyphens like "Mary-Jane").

Handling Numeric Columns

str.contains() only works on string columns. If you have numeric columns that might contain special characters (stored as strings), convert them first or check only object-type columns:

# Only check string (object) columns
string_cols = df.select_dtypes(include='object').columns
mask = df[string_cols].apply(lambda col: col.str.contains(r'[^0-9a-zA-Z ]', na=False)).any(axis=1)

clean_df = df[~mask].reset_index(drop=True)

Complete Reusable Function

import pandas as pd

def remove_rows_with_special_chars(df, columns=None, pattern=r'[^0-9a-zA-Z ]'):
    """
    Remove rows containing special characters from specified columns.
    
    Args:
        df: Input DataFrame
        columns: List of columns to check (default: all object columns)
        pattern: Regex pattern defining special characters
    
    Returns:
        Cleaned DataFrame with special-character rows removed
    """
    if columns is None:
        columns = df.select_dtypes(include='object').columns.tolist()

    mask = pd.Series([False] * len(df), index=df.index)
    for col in columns:
        mask = mask | df[col].str.contains(pattern, na=False)

    removed = mask.sum()
    print(f"Removed {removed} row(s) with special characters.")
    return df[~mask].reset_index(drop=True)


# Usage
df = pd.DataFrame({
    'Name': ['Alice', 'B@b', 'Charlie'],
    'Grade': ['A', 'B', 'C#'],
    'Score': [95, 88, 92]
})

clean_df = remove_rows_with_special_chars(df, columns=['Name', 'Grade'])
print(clean_df)

Output:

Removed 2 row(s) with special characters.
    Name Grade  Score
0  Alice     A     95

Conclusion

Removing rows with special characters in Pandas involves using str.contains() with a regex pattern to identify problematic values, combining conditions across columns with the | operator, and filtering out the matching rows with an inverted mask (~mask).

For dynamic, scalable solutions, use apply() across all string columns or create a reusable function.

Always include na=False to handle NaN values safely, and choose your regex pattern carefully to avoid removing valid data that happens to contain legitimate punctuation or symbols.

Understanding the Approach​

Example: Detecting and Removing Rows​

Creating Sample Data​

Step 1: Detect Rows With Special Characters Per Column​

Step 2: Combine Conditions Across Multiple Columns​

Step 3: Drop the Rows​

Checking All Columns Dynamically​

Using apply() for a Concise Approach​

Customizing Which Characters to Allow​

Handling Numeric Columns​

Complete Reusable Function​

Conclusion​

Table of Contents

Understanding the Approach

Example: Detecting and Removing Rows

Creating Sample Data

Step 1: Detect Rows With Special Characters Per Column

Step 2: Combine Conditions Across Multiple Columns

Step 3: Drop the Rows

Checking All Columns Dynamically

Using `apply()` for a Concise Approach

Customizing Which Characters to Allow

Handling Numeric Columns

Complete Reusable Function

Conclusion