How to Convert a TSV File to CSV in Python
TSV (Tab-Separated Values) and CSV (Comma-Separated Values) are two of the most common plain-text formats for storing tabular data. While they are structurally similar, each row on a separate line, with columns separated by a delimiter, many tools, databases, and APIs expect CSV format specifically. Converting between the two is a frequent task in data processing pipelines.
In this guide, you will learn multiple methods to convert TSV files to CSV in Python, including approaches using the built-in csv module, regular expressions, and Pandas, along with proper handling of edge cases like commas within data fields.
Method 1: Using the csv Module (Recommended)
Python's built-in csv module is purpose-built for reading and writing delimiter-separated files. It properly handles edge cases like quoted fields, embedded commas, and newlines within data, making it the most reliable approach.
import csv
input_file = "data.tsv"
output_file = "data.csv"
with open(input_file, "r", newline="") as tsv_in:
with open(output_file, "w", newline="") as csv_out:
reader = csv.reader(tsv_in, delimiter="\t")
writer = csv.writer(csv_out, delimiter=",")
for row in reader:
writer.writerow(row)
print(f"Successfully converted '{input_file}' to '{output_file}'")
Output:
Successfully converted 'data.tsv' to 'data.csv'
How it works:
csv.reader()reads the TSV file using tab (\t) as the delimiter.csv.writer()writes each row to the CSV file using comma (,) as the delimiter.- The
csvmodule automatically handles quoting fields that contain commas, newlines, or quote characters.
The csv module is the recommended approach because it properly handles special characters. If a data field contains a comma (e.g., "New York, NY"), the module automatically wraps it in quotes in the CSV output, preventing data corruption.
Why newline="" Matters
The newline="" parameter prevents Python from adding extra blank lines on Windows:
# Without newline="" on Windows, you may get double-spaced output
with open("data.csv", "w") as f: # Missing newline=""
writer = csv.writer(f)
writer.writerow(["a", "b"])
writer.writerow(["c", "d"])
# Result may have blank lines between rows on Windows
Always use newline="" when opening files for csv.reader() or csv.writer(). This ensures consistent line ending behavior across operating systems.
Method 2: Using Pandas
Pandas provides the simplest approach, just two lines of code. It reads the TSV file into a DataFrame and writes it out as CSV:
import pandas as pd
# Read the TSV file
df = pd.read_csv("data.tsv", sep="\t")
# Write as CSV
df.to_csv("data.csv", index=False)
print("Successfully converted TSV to CSV")
Output:
Successfully converted TSV to CSV
How it works:
pd.read_csv()withsep="\t"reads the tab-separated file into a DataFrame.to_csv()writes the DataFrame as a comma-separated file.index=Falseprevents Pandas from writing the row index as an extra column.
Handling Large Files with Pandas
For very large TSV files that don't fit in memory, use chunked reading:
import pandas as pd
chunk_size = 10000 # Process 10,000 rows at a time
first_chunk = True
for chunk in pd.read_csv("large_data.tsv", sep="\t", chunksize=chunk_size):
if first_chunk:
chunk.to_csv("large_data.csv", index=False, mode="w")
first_chunk = False
else:
chunk.to_csv("large_data.csv", index=False, mode="a", header=False)
print("Successfully converted large TSV to CSV")
Pandas loads the entire file into memory by default. For files larger than available RAM, use either the chunked approach shown above or the csv module (Method 1), which processes one row at a time.
Method 3: Using Regular Expressions
A quick approach using re.sub() replaces all tab characters with commas:
import re
with open("data.tsv", "r") as tsv_file:
with open("data.csv", "w") as csv_file:
for line in tsv_file:
csv_file.write(re.sub("\t", ",", line))
print("Successfully converted TSV to CSV")
Output:
Successfully converted TSV to CSV
Common Mistake: Data Containing Commas
The regex approach has a critical flaw: it does not handle data fields that already contain commas.
Problem:
Name City Score
Alice New York, NY 95
Bob Los Angeles, CA 87
After simple tab-to-comma replacement:
Name,City,Score
Alice,New York, NY,95
Bob,Los Angeles, CA,87
The comma inside "New York, NY" creates an extra column, corrupting the data. The CSV reader would interpret this as 4 columns instead of 3.
Fix: use the csv module instead:
import csv
with open("data.tsv", "r", newline="") as tsv_in:
with open("data.csv", "w", newline="") as csv_out:
reader = csv.reader(tsv_in, delimiter="\t")
writer = csv.writer(csv_out)
for row in reader:
writer.writerow(row)
The csv module correctly outputs:
Name,City,Score
Alice,"New York, NY",95
Bob,"Los Angeles, CA",87
Fields containing commas are automatically quoted.
Never use simple string replacement (str.replace() or re.sub()) for production TSV-to-CSV conversion. It fails silently when data contains commas, quotes, or newlines within fields. Always use the csv module or Pandas for reliable conversion.
Method 4: Using str.replace() (Simple Cases Only)
For quick-and-dirty conversions where you are certain the data contains no commas:
with open("data.tsv", "r") as tsv_file:
with open("data.csv", "w") as csv_file:
for line in tsv_file:
csv_file.write(line.replace("\t", ","))
print("Successfully converted TSV to CSV")
This is the fastest approach but carries the same risks as the regex method regarding embedded commas.
Complete Example with Verification
Here's a robust script that converts a TSV file to CSV and verifies the result:
import csv
def convert_tsv_to_csv(tsv_path, csv_path):
"""Convert a TSV file to CSV with proper handling of special characters."""
row_count = 0
with open(tsv_path, "r", newline="", encoding="utf-8") as tsv_in:
with open(csv_path, "w", newline="", encoding="utf-8") as csv_out:
reader = csv.reader(tsv_in, delimiter="\t")
writer = csv.writer(csv_out, quoting=csv.QUOTE_MINIMAL)
for row in reader:
writer.writerow(row)
row_count += 1
return row_count
# Convert the file
tsv_file = "data.tsv"
csv_file = "data.csv"
total_rows = convert_tsv_to_csv(tsv_file, csv_file)
print(f"Converted {total_rows} rows from '{tsv_file}' to '{csv_file}'")
# Verify by reading the CSV back
with open(csv_file, "r", newline="") as f:
reader = csv.reader(f)
for i, row in enumerate(reader):
if i < 3: # Show first 3 rows
print(row)
Comparison of Methods
| Method | Handles Commas in Data | Memory Efficient | External Dependencies | Best For |
|---|---|---|---|---|
csv module | ✅ Yes (auto-quoting) | ✅ Yes (row by row) | None | Production use (recommended) |
| Pandas | ✅ Yes | ❌ Loads entire file | pandas | Data analysis workflows |
| Pandas (chunked) | ✅ Yes | ✅ Yes | pandas | Very large files in Pandas workflows |
re.sub() / replace() | ❌ No | ✅ Yes | None | Quick scripts, guaranteed clean data |
Summary
Converting TSV files to CSV in Python is straightforward, but choosing the right method matters for data integrity:
- Use the
csvmodule for the most reliable conversion. it handles commas, quotes, and newlines within data fields automatically. This is the recommended approach for production code. - Use Pandas when you're already working in a data analysis pipeline or need to transform the data during conversion.
- Avoid simple string replacement (
str.replace()orre.sub()) unless you are absolutely certain your data contains no commas, quotes, or newlines within fields.
Key reminders:
- Always use
newline=""when opening files for thecsvmodule. - Use
index=Falsewith Pandasto_csv()to avoid writing an unwanted index column. - Use
encoding="utf-8"when working with files that may contain non-ASCII characters.