How to Delete a CSV Column in Python
Removing columns from CSV files is a common data cleaning task. Whether you are eliminating personally identifiable information, stripping out redundant fields, or discarding noisy data before analysis, Python provides several approaches depending on your environment, file size, and requirements.
In this guide, you will learn how to delete CSV columns using Pandas, the built-in csv module, and several filtering techniques. Each method is explained with clear examples and guidance on when to use it.
Consider the data.csv file as input for the following examples:
User_ID,Name,Email,Score,Temp_Col,Debug_Info
101,Alice,a@x.com,85,tmp,dbg
102,Bob,b@x.com,92,tmp,dbg
Using Pandas (Recommended)
Pandas provides the most straightforward and efficient approach for column removal:
import pandas as pd
# Read the CSV
df = pd.read_csv("data.csv")
print("Before:")
print(df.head())
# Drop a single column
df = df.drop(columns=["User_ID"])
# Drop multiple columns
df = df.drop(columns=["Temp_Col", "Debug_Info"])
# Save the result
df.to_csv("clean.csv", index=False)
print("\nAfter:")
print(df.head())
Example output:
Before:
User_ID Name Email Score Temp_Col Debug_Info
0 101 Alice a@x.com 85 tmp dbg
1 102 Bob b@x.com 92 tmp dbg
After:
Name Email Score
0 Alice a@x.com 85
1 Bob b@x.com 92
The drop() method returns a new DataFrame by default. You can also modify in place without reassignment:
df.drop(columns=["User_ID"], inplace=True)
Filtering Columns During Import
For better memory efficiency, exclude unwanted columns at load time rather than loading everything and dropping afterward:
import pandas as pd
# Only load specific columns
df = pd.read_csv("data.csv", usecols=["Name", "Email", "Score"])
# Or use a function to exclude specific columns
df = pd.read_csv(
"data.csv",
usecols=lambda col: col not in ["User_ID", "Internal_Code"]
)
print(df.columns.tolist())
Output:
['Name', 'Email', 'Score', 'Temp_Col', 'Debug_Info']
Using usecols is more memory-efficient than loading the entire file and dropping columns afterward. This makes a significant difference with large files that have many unwanted columns.
Dropping Columns by Pattern
When column names follow a naming convention, you can remove them using string matching or regular expressions:
import pandas as pd
df = pd.read_csv("data.csv")
# Drop columns starting with "Unnamed"
df = df.loc[:, ~df.columns.str.startswith("Unnamed")]
# Drop columns containing "temp" (case-insensitive)
df = df.loc[:, ~df.columns.str.contains("temp", case=False)]
# Keep columns that do NOT start with "debug_"
df = df.filter(regex=r"^(?!debug_)")
# Drop all string columns, keeping only numeric ones
df = df.select_dtypes(exclude=["object"])
The ~ operator negates the boolean mask, so ~df.columns.str.startswith("Unnamed") selects columns that do not start with "Unnamed".
Dropping Columns by Position
When column names are unknown, unreliable, or auto-generated, you can drop columns by their numeric index:
import pandas as pd
# Drop the first column
df = pd.read_csv("data.csv")
df = df.iloc[:, 1:]
print(f"{df}\n")
# Drop the last column
df = pd.read_csv("data.csv")
df = df.iloc[:, :-1]
print(f"{df}\n")
# Drop columns at specific positions (0-indexed)
df = pd.read_csv("data.csv")
df = df.drop(df.columns[[0, 2, 5]], axis=1)
print(f"{df}\n")
# Keep only columns at positions 1 through 3
df = pd.read_csv("data.csv")
df = df.iloc[:, 1:4]
print(f"{df}\n")
Output:
Name Email Score Temp_Col Debug_Info
0 Alice a@x.com 85 tmp dbg
1 Bob b@x.com 92 tmp dbg
User_ID Name Email Score Temp_Col
0 101 Alice a@x.com 85 tmp
1 102 Bob b@x.com 92 tmp
Name Score Temp_Col
0 Alice 85 tmp
1 Bob 92 tmp
Name Email Score
0 Alice a@x.com 85
1 Bob b@x.com 92
Handling Missing Columns Gracefully
If you try to drop a column that does not exist, Pandas raises a KeyError by default:
import pandas as pd
df = pd.DataFrame({"Name": ["Alice"], "Score": [85]})
# This raises KeyError because "User_ID" does not exist
df = df.drop(columns=["User_ID"])
Output:
KeyError: "['User_ID'] not found in axis"
To avoid this, use the errors="ignore" parameter:
import pandas as pd
df = pd.DataFrame({"Name": ["Alice"], "Score": [85]})
df = df.drop(columns=["User_ID", "Maybe_Missing"], errors="ignore")
print(df.columns.tolist())
Output:
['Name', 'Score']
Using errors="ignore" silently skips missing columns. This is convenient but can hide real issues, such as a column being renamed or the wrong file being loaded. When data integrity matters, check for column existence explicitly:
columns_to_drop = ["User_ID", "Maybe_Missing"]
existing = [col for col in columns_to_drop if col in df.columns]
df = df.drop(columns=existing)
Using the Standard Library csv Module
For environments where Pandas is not available, the built-in csv module handles column removal without any external dependencies:
import csv
columns_to_drop = {"User_ID", "Internal_Notes"}
with open("data.csv", "r", newline="") as infile:
reader = csv.DictReader(infile)
fieldnames = [col for col in reader.fieldnames if col not in columns_to_drop]
with open("output.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in reader:
filtered_row = {key: row[key] for key in fieldnames}
writer.writerow(filtered_row)
For positional column removal when headers are absent or unreliable:
import csv
columns_to_keep = [0, 2, 3] # Keep first, third, and fourth columns
with open("data.csv", "r", newline="") as infile:
reader = csv.reader(infile)
with open("output.csv", "w", newline="") as outfile:
writer = csv.writer(outfile)
for row in reader:
filtered_row = [row[i] for i in columns_to_keep]
writer.writerow(filtered_row)
The csv module processes files row by row, so memory usage remains low even for very large files.
Processing Large Files in Chunks
For files too large to fit in memory, Pandas can process them in chunks:
import pandas as pd
columns_to_keep = ["Name", "Email", "Score"]
chunk_size = 10_000
first_chunk = True
for chunk in pd.read_csv("large_file.csv", usecols=columns_to_keep, chunksize=chunk_size):
mode = "w" if first_chunk else "a"
header = first_chunk
chunk.to_csv("output.csv", mode=mode, header=header, index=False)
first_chunk = False
print("Processing complete")
The first chunk is written with headers in write mode ("w"), and all subsequent chunks are appended without headers ("a" mode with header=False).
Complete Reusable Function
A production-ready function that handles both drop and keep scenarios:
import pandas as pd
from typing import Optional
def remove_csv_columns(
input_path: str,
output_path: str,
columns_to_drop: Optional[list[str]] = None,
columns_to_keep: Optional[list[str]] = None
) -> int:
"""
Remove columns from a CSV file.
Specify either columns_to_drop OR columns_to_keep, not both.
Returns the number of rows processed.
"""
if columns_to_drop and columns_to_keep:
raise ValueError("Specify either columns_to_drop or columns_to_keep, not both")
if columns_to_keep:
df = pd.read_csv(input_path, usecols=columns_to_keep)
else:
df = pd.read_csv(input_path)
if columns_to_drop:
df = df.drop(columns=columns_to_drop, errors="ignore")
df.to_csv(output_path, index=False)
return len(df)
# Usage
rows = remove_csv_columns(
"data.csv",
"clean.csv",
columns_to_drop=["User_ID", "SSN", "Internal_Code"]
)
print(f"Processed {rows} rows")
Example output:
Processed 1500 rows
Method Comparison
| Method | Speed | Memory Usage | Dependencies |
|---|---|---|---|
pd.read_csv(usecols=...) | Fast | Low | Pandas |
df.drop(columns=...) | Fast | Higher (full load first) | Pandas |
| Pandas chunked processing | Moderate | Low | Pandas |
csv module | Slower | Lowest | None (stdlib) |
Conclusion
- Use
usecolswhen you know which columns to keep upfront, as it is the most memory-efficient Pandas approach. - Use
df.drop(columns=...)when exploring data interactively or when the columns to remove are determined at runtime. - For pattern-based removal, combine
df.columns.strmethods with boolean indexing to filter by naming conventions. - Use the
csvmodule in constrained environments without Pandas or when you need minimal memory overhead. - For very large files, process in chunks to avoid loading the entire dataset into memory at once.