Skip to main content

How to Convert a TSV File to CSV in Python

TSV (Tab-Separated Values) and CSV (Comma-Separated Values) are two of the most common plain-text formats for storing tabular data. While they are structurally similar, each row on a separate line, with columns separated by a delimiter, many tools, databases, and APIs expect CSV format specifically. Converting between the two is a frequent task in data processing pipelines.

In this guide, you will learn multiple methods to convert TSV files to CSV in Python, including approaches using the built-in csv module, regular expressions, and Pandas, along with proper handling of edge cases like commas within data fields.

Python's built-in csv module is purpose-built for reading and writing delimiter-separated files. It properly handles edge cases like quoted fields, embedded commas, and newlines within data, making it the most reliable approach.

import csv

input_file = "data.tsv"
output_file = "data.csv"

with open(input_file, "r", newline="") as tsv_in:
with open(output_file, "w", newline="") as csv_out:
reader = csv.reader(tsv_in, delimiter="\t")
writer = csv.writer(csv_out, delimiter=",")

for row in reader:
writer.writerow(row)

print(f"Successfully converted '{input_file}' to '{output_file}'")

Output:

Successfully converted 'data.tsv' to 'data.csv'

How it works:

  1. csv.reader() reads the TSV file using tab (\t) as the delimiter.
  2. csv.writer() writes each row to the CSV file using comma (,) as the delimiter.
  3. The csv module automatically handles quoting fields that contain commas, newlines, or quote characters.
tip

The csv module is the recommended approach because it properly handles special characters. If a data field contains a comma (e.g., "New York, NY"), the module automatically wraps it in quotes in the CSV output, preventing data corruption.

Why newline="" Matters

The newline="" parameter prevents Python from adding extra blank lines on Windows:

# Without newline="" on Windows, you may get double-spaced output
with open("data.csv", "w") as f: # Missing newline=""
writer = csv.writer(f)
writer.writerow(["a", "b"])
writer.writerow(["c", "d"])
# Result may have blank lines between rows on Windows
info

Always use newline="" when opening files for csv.reader() or csv.writer(). This ensures consistent line ending behavior across operating systems.

Method 2: Using Pandas

Pandas provides the simplest approach, just two lines of code. It reads the TSV file into a DataFrame and writes it out as CSV:

import pandas as pd

# Read the TSV file
df = pd.read_csv("data.tsv", sep="\t")

# Write as CSV
df.to_csv("data.csv", index=False)

print("Successfully converted TSV to CSV")

Output:

Successfully converted TSV to CSV

How it works:

  1. pd.read_csv() with sep="\t" reads the tab-separated file into a DataFrame.
  2. to_csv() writes the DataFrame as a comma-separated file.
  3. index=False prevents Pandas from writing the row index as an extra column.

Handling Large Files with Pandas

For very large TSV files that don't fit in memory, use chunked reading:

import pandas as pd

chunk_size = 10000 # Process 10,000 rows at a time
first_chunk = True

for chunk in pd.read_csv("large_data.tsv", sep="\t", chunksize=chunk_size):
if first_chunk:
chunk.to_csv("large_data.csv", index=False, mode="w")
first_chunk = False
else:
chunk.to_csv("large_data.csv", index=False, mode="a", header=False)

print("Successfully converted large TSV to CSV")
caution

Pandas loads the entire file into memory by default. For files larger than available RAM, use either the chunked approach shown above or the csv module (Method 1), which processes one row at a time.

Method 3: Using Regular Expressions

A quick approach using re.sub() replaces all tab characters with commas:

import re

with open("data.tsv", "r") as tsv_file:
with open("data.csv", "w") as csv_file:
for line in tsv_file:
csv_file.write(re.sub("\t", ",", line))

print("Successfully converted TSV to CSV")

Output:

Successfully converted TSV to CSV

Common Mistake: Data Containing Commas

The regex approach has a critical flaw: it does not handle data fields that already contain commas.

Problem:

Name	City	Score
Alice New York, NY 95
Bob Los Angeles, CA 87

After simple tab-to-comma replacement:

Name,City,Score
Alice,New York, NY,95
Bob,Los Angeles, CA,87

The comma inside "New York, NY" creates an extra column, corrupting the data. The CSV reader would interpret this as 4 columns instead of 3.

Fix: use the csv module instead:

import csv

with open("data.tsv", "r", newline="") as tsv_in:
with open("data.csv", "w", newline="") as csv_out:
reader = csv.reader(tsv_in, delimiter="\t")
writer = csv.writer(csv_out)
for row in reader:
writer.writerow(row)

The csv module correctly outputs:

Name,City,Score
Alice,"New York, NY",95
Bob,"Los Angeles, CA",87

Fields containing commas are automatically quoted.

caution

Never use simple string replacement (str.replace() or re.sub()) for production TSV-to-CSV conversion. It fails silently when data contains commas, quotes, or newlines within fields. Always use the csv module or Pandas for reliable conversion.

Method 4: Using str.replace() (Simple Cases Only)

For quick-and-dirty conversions where you are certain the data contains no commas:

with open("data.tsv", "r") as tsv_file:
with open("data.csv", "w") as csv_file:
for line in tsv_file:
csv_file.write(line.replace("\t", ","))

print("Successfully converted TSV to CSV")

This is the fastest approach but carries the same risks as the regex method regarding embedded commas.

Complete Example with Verification

Here's a robust script that converts a TSV file to CSV and verifies the result:

import csv

def convert_tsv_to_csv(tsv_path, csv_path):
"""Convert a TSV file to CSV with proper handling of special characters."""
row_count = 0

with open(tsv_path, "r", newline="", encoding="utf-8") as tsv_in:
with open(csv_path, "w", newline="", encoding="utf-8") as csv_out:
reader = csv.reader(tsv_in, delimiter="\t")
writer = csv.writer(csv_out, quoting=csv.QUOTE_MINIMAL)

for row in reader:
writer.writerow(row)
row_count += 1

return row_count


# Convert the file
tsv_file = "data.tsv"
csv_file = "data.csv"

total_rows = convert_tsv_to_csv(tsv_file, csv_file)
print(f"Converted {total_rows} rows from '{tsv_file}' to '{csv_file}'")

# Verify by reading the CSV back
with open(csv_file, "r", newline="") as f:
reader = csv.reader(f)
for i, row in enumerate(reader):
if i < 3: # Show first 3 rows
print(row)

Comparison of Methods

MethodHandles Commas in DataMemory EfficientExternal DependenciesBest For
csv module✅ Yes (auto-quoting)✅ Yes (row by row)NoneProduction use (recommended)
Pandas✅ Yes❌ Loads entire filepandasData analysis workflows
Pandas (chunked)✅ Yes✅ YespandasVery large files in Pandas workflows
re.sub() / replace()❌ No✅ YesNoneQuick scripts, guaranteed clean data

Summary

Converting TSV files to CSV in Python is straightforward, but choosing the right method matters for data integrity:

  • Use the csv module for the most reliable conversion. it handles commas, quotes, and newlines within data fields automatically. This is the recommended approach for production code.
  • Use Pandas when you're already working in a data analysis pipeline or need to transform the data during conversion.
  • Avoid simple string replacement (str.replace() or re.sub()) unless you are absolutely certain your data contains no commas, quotes, or newlines within fields.

Key reminders:

  • Always use newline="" when opening files for the csv module.
  • Use index=False with Pandas to_csv() to avoid writing an unwanted index column.
  • Use encoding="utf-8" when working with files that may contain non-ASCII characters.