Skip to main content

How to Delete a CSV Column in Python

Removing columns from CSV files is a common data cleaning task. Whether you are eliminating personally identifiable information, stripping out redundant fields, or discarding noisy data before analysis, Python provides several approaches depending on your environment, file size, and requirements.

In this guide, you will learn how to delete CSV columns using Pandas, the built-in csv module, and several filtering techniques. Each method is explained with clear examples and guidance on when to use it.

Consider the data.csv file as input for the following examples:

data.csv
User_ID,Name,Email,Score,Temp_Col,Debug_Info
101,Alice,a@x.com,85,tmp,dbg
102,Bob,b@x.com,92,tmp,dbg

Pandas provides the most straightforward and efficient approach for column removal:

import pandas as pd

# Read the CSV
df = pd.read_csv("data.csv")
print("Before:")
print(df.head())

# Drop a single column
df = df.drop(columns=["User_ID"])

# Drop multiple columns
df = df.drop(columns=["Temp_Col", "Debug_Info"])

# Save the result
df.to_csv("clean.csv", index=False)
print("\nAfter:")
print(df.head())

Example output:

Before:
User_ID Name Email Score Temp_Col Debug_Info
0 101 Alice a@x.com 85 tmp dbg
1 102 Bob b@x.com 92 tmp dbg

After:
Name Email Score
0 Alice a@x.com 85
1 Bob b@x.com 92

The drop() method returns a new DataFrame by default. You can also modify in place without reassignment:

df.drop(columns=["User_ID"], inplace=True)

Filtering Columns During Import

For better memory efficiency, exclude unwanted columns at load time rather than loading everything and dropping afterward:

import pandas as pd

# Only load specific columns
df = pd.read_csv("data.csv", usecols=["Name", "Email", "Score"])

# Or use a function to exclude specific columns
df = pd.read_csv(
"data.csv",
usecols=lambda col: col not in ["User_ID", "Internal_Code"]
)

print(df.columns.tolist())

Output:

['Name', 'Email', 'Score', 'Temp_Col', 'Debug_Info']
tip

Using usecols is more memory-efficient than loading the entire file and dropping columns afterward. This makes a significant difference with large files that have many unwanted columns.

Dropping Columns by Pattern

When column names follow a naming convention, you can remove them using string matching or regular expressions:

import pandas as pd

df = pd.read_csv("data.csv")

# Drop columns starting with "Unnamed"
df = df.loc[:, ~df.columns.str.startswith("Unnamed")]

# Drop columns containing "temp" (case-insensitive)
df = df.loc[:, ~df.columns.str.contains("temp", case=False)]

# Keep columns that do NOT start with "debug_"
df = df.filter(regex=r"^(?!debug_)")

# Drop all string columns, keeping only numeric ones
df = df.select_dtypes(exclude=["object"])

The ~ operator negates the boolean mask, so ~df.columns.str.startswith("Unnamed") selects columns that do not start with "Unnamed".

Dropping Columns by Position

When column names are unknown, unreliable, or auto-generated, you can drop columns by their numeric index:

import pandas as pd

# Drop the first column
df = pd.read_csv("data.csv")
df = df.iloc[:, 1:]
print(f"{df}\n")

# Drop the last column
df = pd.read_csv("data.csv")
df = df.iloc[:, :-1]
print(f"{df}\n")

# Drop columns at specific positions (0-indexed)
df = pd.read_csv("data.csv")
df = df.drop(df.columns[[0, 2, 5]], axis=1)
print(f"{df}\n")

# Keep only columns at positions 1 through 3
df = pd.read_csv("data.csv")
df = df.iloc[:, 1:4]
print(f"{df}\n")

Output:

    Name    Email  Score Temp_Col Debug_Info
0 Alice a@x.com 85 tmp dbg
1 Bob b@x.com 92 tmp dbg

User_ID Name Email Score Temp_Col
0 101 Alice a@x.com 85 tmp
1 102 Bob b@x.com 92 tmp

Name Score Temp_Col
0 Alice 85 tmp
1 Bob 92 tmp

Name Email Score
0 Alice a@x.com 85
1 Bob b@x.com 92

Handling Missing Columns Gracefully

If you try to drop a column that does not exist, Pandas raises a KeyError by default:

import pandas as pd

df = pd.DataFrame({"Name": ["Alice"], "Score": [85]})

# This raises KeyError because "User_ID" does not exist
df = df.drop(columns=["User_ID"])

Output:

KeyError: "['User_ID'] not found in axis"

To avoid this, use the errors="ignore" parameter:

import pandas as pd

df = pd.DataFrame({"Name": ["Alice"], "Score": [85]})

df = df.drop(columns=["User_ID", "Maybe_Missing"], errors="ignore")
print(df.columns.tolist())

Output:

['Name', 'Score']
warning

Using errors="ignore" silently skips missing columns. This is convenient but can hide real issues, such as a column being renamed or the wrong file being loaded. When data integrity matters, check for column existence explicitly:

columns_to_drop = ["User_ID", "Maybe_Missing"]
existing = [col for col in columns_to_drop if col in df.columns]
df = df.drop(columns=existing)

Using the Standard Library csv Module

For environments where Pandas is not available, the built-in csv module handles column removal without any external dependencies:

import csv

columns_to_drop = {"User_ID", "Internal_Notes"}

with open("data.csv", "r", newline="") as infile:
reader = csv.DictReader(infile)

fieldnames = [col for col in reader.fieldnames if col not in columns_to_drop]

with open("output.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()

for row in reader:
filtered_row = {key: row[key] for key in fieldnames}
writer.writerow(filtered_row)

For positional column removal when headers are absent or unreliable:

import csv

columns_to_keep = [0, 2, 3] # Keep first, third, and fourth columns

with open("data.csv", "r", newline="") as infile:
reader = csv.reader(infile)

with open("output.csv", "w", newline="") as outfile:
writer = csv.writer(outfile)

for row in reader:
filtered_row = [row[i] for i in columns_to_keep]
writer.writerow(filtered_row)

The csv module processes files row by row, so memory usage remains low even for very large files.

Processing Large Files in Chunks

For files too large to fit in memory, Pandas can process them in chunks:

import pandas as pd

columns_to_keep = ["Name", "Email", "Score"]
chunk_size = 10_000

first_chunk = True
for chunk in pd.read_csv("large_file.csv", usecols=columns_to_keep, chunksize=chunk_size):
mode = "w" if first_chunk else "a"
header = first_chunk
chunk.to_csv("output.csv", mode=mode, header=header, index=False)
first_chunk = False

print("Processing complete")

The first chunk is written with headers in write mode ("w"), and all subsequent chunks are appended without headers ("a" mode with header=False).

Complete Reusable Function

A production-ready function that handles both drop and keep scenarios:

import pandas as pd
from typing import Optional

def remove_csv_columns(
input_path: str,
output_path: str,
columns_to_drop: Optional[list[str]] = None,
columns_to_keep: Optional[list[str]] = None
) -> int:
"""
Remove columns from a CSV file.

Specify either columns_to_drop OR columns_to_keep, not both.
Returns the number of rows processed.
"""
if columns_to_drop and columns_to_keep:
raise ValueError("Specify either columns_to_drop or columns_to_keep, not both")

if columns_to_keep:
df = pd.read_csv(input_path, usecols=columns_to_keep)
else:
df = pd.read_csv(input_path)
if columns_to_drop:
df = df.drop(columns=columns_to_drop, errors="ignore")

df.to_csv(output_path, index=False)
return len(df)

# Usage
rows = remove_csv_columns(
"data.csv",
"clean.csv",
columns_to_drop=["User_ID", "SSN", "Internal_Code"]
)
print(f"Processed {rows} rows")

Example output:

Processed 1500 rows

Method Comparison

MethodSpeedMemory UsageDependencies
pd.read_csv(usecols=...)FastLowPandas
df.drop(columns=...)FastHigher (full load first)Pandas
Pandas chunked processingModerateLowPandas
csv moduleSlowerLowestNone (stdlib)

Conclusion

  • Use usecols when you know which columns to keep upfront, as it is the most memory-efficient Pandas approach.
  • Use df.drop(columns=...) when exploring data interactively or when the columns to remove are determined at runtime.
  • For pattern-based removal, combine df.columns.str methods with boolean indexing to filter by naming conventions.
  • Use the csv module in constrained environments without Pandas or when you need minimal memory overhead.
  • For very large files, process in chunks to avoid loading the entire dataset into memory at once.