How to Count the Number of Lines in a CSV File in Python
Counting the number of lines or rows in a CSV file is a common task when working with data, whether you're validating file imports, monitoring data pipelines, estimating processing time, or performing quick data profiling. Python provides several approaches to accomplish this, ranging from lightweight pure-Python methods to pandas-based solutions.
In this guide, you'll learn multiple methods to count lines in a CSV file, understand the difference between counting total lines (including the header) and data rows (excluding the header), and choose the right approach based on your file size and requirements.
Sample CSV Fileโ
For the examples in this guide, we'll use a CSV file called employees.csv:
Name,Department,Salary
Alice,Engineering,90000
Bob,Marketing,75000
Charlie,Sales,68000
Diana,Engineering,92000
Eve,Marketing,71000
This file has 6 total lines (1 header + 5 data rows).
Using sum() with File Iteration (Fastest for Large Files)โ
The most memory-efficient and fastest pure-Python approach reads the file line by line using a generator expression:
with open('employees.csv', 'r') as f:
total_lines = sum(1 for line in f)
print(f"Total lines (including header): {total_lines}")
print(f"Data rows (excluding header): {total_lines - 1}")
Output:
Total lines (including header): 6
Data rows (excluding header): 5
How it works:
- The generator
(1 for line in f)yields1for each line in the file. sum()adds them all up, giving the total line count.- The file is read line by line, so memory usage stays constant regardless of file size.
This method is the best choice for large files (hundreds of MB or larger) because it never loads the entire file into memory. It processes one line at a time and discards it immediately.
Using the csv Moduleโ
Python's built-in csv module properly handles CSV parsing, including quoted fields that span multiple lines:
import csv
with open('employees.csv', 'r') as f:
reader = csv.reader(f)
row_count = sum(1 for row in reader)
print(f"Total rows (including header): {row_count}")
print(f"Data rows: {row_count - 1}")
Output:
Total rows (including header): 6
Data rows: 5
csv.reader vs Simple Line CountingSimple line counting (sum(1 for line in f)) counts text lines. If your CSV has values that contain newline characters within quoted fields, a single row might span multiple text lines. The csv.reader approach counts logical rows, handling multi-line fields correctly:
Name,Bio
Alice,"Software engineer
from London"
Bob,"Data scientist"
- Simple line count: 4 lines
csv.readercount: 3 rows (which is the correct CSV row count)
Using len() with readlines()โ
A simple approach that reads all lines into a list:
with open('employees.csv', 'r') as f:
lines = f.readlines()
total = len(lines)
print(f"Total lines: {total}")
Output:
Total lines: 6
readlines() loads the entire file into memory as a list of strings. For very large files, this can consume significant memory. Prefer the sum(1 for line in f) approach for files larger than a few hundred MB.
Using Pandasโ
Quick Count with len()โ
The simplest pandas approach loads the entire file and counts rows:
import pandas as pd
df = pd.read_csv('employees.csv')
print(f"Data rows: {len(df)}")
print(f"Total lines (with header): {len(df) + 1}")
Output:
Data rows: 5
Total lines (with header): 6
len(df) returns the number of data rows (excluding the header). Add 1 to include the header row in the total.
Memory-Efficient Count with chunksizeโ
For large files that don't fit in memory, use pandas chunked reading:
import pandas as pd
data_rows = sum(len(chunk) for chunk in pd.read_csv('employees.csv', chunksize=10000))
print(f"Data rows: {data_rows}")
print(f"Total lines (with header): {data_rows + 1}")
Output:
Data rows: 5
Total lines (with header): 6
This reads the file in batches of 10,000 rows, counts each batch, and sums the results, keeping memory usage bounded.
Count Without Loading Dataโ
If you only need the count and want to minimize overhead, read only the index:
import pandas as pd
row_count = pd.read_csv('employees.csv', usecols=[0]).shape[0]
print(f"Data rows: {row_count}")
Output:
Data rows: 5
Using usecols=[0] reads only the first column, reducing memory and parsing time.
Using Shell Commands with subprocess (Unix/Linux/macOS)โ
On Unix-based systems, the wc -l command counts lines extremely fast:
import subprocess
result = subprocess.check_output(['wc', '-l', 'employees.csv'])
line_count = int(result.split()[0])
print(f"Total lines: {line_count}")
Output:
Total lines: 5
wc -l is a Unix command and is not available on Windows. For cross-platform code, use the pure Python approaches instead.
Building a Reusable Functionโ
Here's a comprehensive function that works efficiently for any file size:
import csv
def count_csv_rows(filepath, include_header=False):
"""
Count the number of rows in a CSV file.
Args:
filepath: Path to the CSV file.
include_header: If True, includes the header row in the count.
Returns:
Number of rows.
"""
with open(filepath, 'r', encoding='utf-8') as f:
reader = csv.reader(f)
count = sum(1 for _ in reader)
if not include_header:
count -= 1
return count
# Usage
total_lines = count_csv_rows('employees.csv', include_header=True)
data_rows = count_csv_rows('employees.csv', include_header=False)
print(f"Total lines: {total_lines}")
print(f"Data rows: {data_rows}")
Output:
Total lines: 6
Data rows: 5
Handling Edge Casesโ
Empty Filesโ
with open('empty.csv', 'w') as f:
pass # Create an empty file
with open('empty.csv', 'r') as f:
count = sum(1 for line in f)
print(f"Lines in empty file: {count}")
Output:
Lines in empty file: 0
Header-Only Filesโ
with open('header_only.csv', 'w') as f:
f.write("Name,Age,City\n")
with open('header_only.csv', 'r') as f:
count = sum(1 for line in f)
print(f"Total lines: {count}")
print(f"Data rows: {count - 1}")
Output:
Total lines: 1
Data rows: 0
Quick Comparison of Methodsโ
| Method | Memory Efficient | Handles Multi-line Fields | Speed | Cross-Platform |
|---|---|---|---|---|
sum(1 for line in f) | โ Excellent | โ | โก Fast | โ |
csv.reader + sum() | โ Excellent | โ | โก Fast | โ |
readlines() + len() | โ Loads entire file | โ | Fast | โ |
len(pd.read_csv()) | โ Loads entire file | โ | Moderate | โ |
pd.read_csv(chunksize=...) | โ Good | โ | Moderate | โ |
wc -l via subprocess | โ Excellent | โ | โกโก Fastest | โ Unix only |
Conclusionโ
Python offers several methods to count lines in a CSV file, each suited to different situations:
sum(1 for line in f)is the fastest pure-Python method and handles files of any size with minimal memory. Ideal for most use cases.csv.readeris the safest choice when your CSV may contain multi-line quoted fields, as it counts logical rows correctly.len(pd.read_csv())is the simplest option when you're already working with pandas, but loads the entire file into memory.pd.read_csv(chunksize=...)balances pandas convenience with memory efficiency for large files.wc -lvia subprocess is the fastest option on Unix systems but isn't portable.
For most applications, the simple generator-based approach (sum(1 for line in f)) offers the best combination of speed, memory efficiency, and simplicity.