Skip to main content

How to Count the Number of Lines in a CSV File in Python

Counting the number of lines or rows in a CSV file is a common task when working with data, whether you're validating file imports, monitoring data pipelines, estimating processing time, or performing quick data profiling. Python provides several approaches to accomplish this, ranging from lightweight pure-Python methods to pandas-based solutions.

In this guide, you'll learn multiple methods to count lines in a CSV file, understand the difference between counting total lines (including the header) and data rows (excluding the header), and choose the right approach based on your file size and requirements.

Sample CSV Fileโ€‹

For the examples in this guide, we'll use a CSV file called employees.csv:

Name,Department,Salary
Alice,Engineering,90000
Bob,Marketing,75000
Charlie,Sales,68000
Diana,Engineering,92000
Eve,Marketing,71000

This file has 6 total lines (1 header + 5 data rows).

Using sum() with File Iteration (Fastest for Large Files)โ€‹

The most memory-efficient and fastest pure-Python approach reads the file line by line using a generator expression:

with open('employees.csv', 'r') as f:
total_lines = sum(1 for line in f)

print(f"Total lines (including header): {total_lines}")
print(f"Data rows (excluding header): {total_lines - 1}")

Output:

Total lines (including header): 6
Data rows (excluding header): 5

How it works:

  • The generator (1 for line in f) yields 1 for each line in the file.
  • sum() adds them all up, giving the total line count.
  • The file is read line by line, so memory usage stays constant regardless of file size.
tip

This method is the best choice for large files (hundreds of MB or larger) because it never loads the entire file into memory. It processes one line at a time and discards it immediately.

Using the csv Moduleโ€‹

Python's built-in csv module properly handles CSV parsing, including quoted fields that span multiple lines:

import csv

with open('employees.csv', 'r') as f:
reader = csv.reader(f)
row_count = sum(1 for row in reader)

print(f"Total rows (including header): {row_count}")
print(f"Data rows: {row_count - 1}")

Output:

Total rows (including header): 6
Data rows: 5
When to Use csv.reader vs Simple Line Counting

Simple line counting (sum(1 for line in f)) counts text lines. If your CSV has values that contain newline characters within quoted fields, a single row might span multiple text lines. The csv.reader approach counts logical rows, handling multi-line fields correctly:

Name,Bio
Alice,"Software engineer
from London"
Bob,"Data scientist"
  • Simple line count: 4 lines
  • csv.reader count: 3 rows (which is the correct CSV row count)

Using len() with readlines()โ€‹

A simple approach that reads all lines into a list:

with open('employees.csv', 'r') as f:
lines = f.readlines()
total = len(lines)

print(f"Total lines: {total}")

Output:

Total lines: 6
Memory Usage

readlines() loads the entire file into memory as a list of strings. For very large files, this can consume significant memory. Prefer the sum(1 for line in f) approach for files larger than a few hundred MB.

Using Pandasโ€‹

Quick Count with len()โ€‹

The simplest pandas approach loads the entire file and counts rows:

import pandas as pd

df = pd.read_csv('employees.csv')

print(f"Data rows: {len(df)}")
print(f"Total lines (with header): {len(df) + 1}")

Output:

Data rows: 5
Total lines (with header): 6
note

len(df) returns the number of data rows (excluding the header). Add 1 to include the header row in the total.

Memory-Efficient Count with chunksizeโ€‹

For large files that don't fit in memory, use pandas chunked reading:

import pandas as pd

data_rows = sum(len(chunk) for chunk in pd.read_csv('employees.csv', chunksize=10000))

print(f"Data rows: {data_rows}")
print(f"Total lines (with header): {data_rows + 1}")

Output:

Data rows: 5
Total lines (with header): 6

This reads the file in batches of 10,000 rows, counts each batch, and sums the results, keeping memory usage bounded.

Count Without Loading Dataโ€‹

If you only need the count and want to minimize overhead, read only the index:

import pandas as pd

row_count = pd.read_csv('employees.csv', usecols=[0]).shape[0]
print(f"Data rows: {row_count}")

Output:

Data rows: 5

Using usecols=[0] reads only the first column, reducing memory and parsing time.

Using Shell Commands with subprocess (Unix/Linux/macOS)โ€‹

On Unix-based systems, the wc -l command counts lines extremely fast:

import subprocess

result = subprocess.check_output(['wc', '-l', 'employees.csv'])
line_count = int(result.split()[0])

print(f"Total lines: {line_count}")

Output:

Total lines: 5
note

wc -l is a Unix command and is not available on Windows. For cross-platform code, use the pure Python approaches instead.

Building a Reusable Functionโ€‹

Here's a comprehensive function that works efficiently for any file size:

import csv

def count_csv_rows(filepath, include_header=False):
"""
Count the number of rows in a CSV file.

Args:
filepath: Path to the CSV file.
include_header: If True, includes the header row in the count.

Returns:
Number of rows.
"""
with open(filepath, 'r', encoding='utf-8') as f:
reader = csv.reader(f)
count = sum(1 for _ in reader)

if not include_header:
count -= 1

return count


# Usage
total_lines = count_csv_rows('employees.csv', include_header=True)
data_rows = count_csv_rows('employees.csv', include_header=False)

print(f"Total lines: {total_lines}")
print(f"Data rows: {data_rows}")

Output:

Total lines:  6
Data rows: 5

Handling Edge Casesโ€‹

Empty Filesโ€‹

with open('empty.csv', 'w') as f:
pass # Create an empty file

with open('empty.csv', 'r') as f:
count = sum(1 for line in f)

print(f"Lines in empty file: {count}")

Output:

Lines in empty file: 0

Header-Only Filesโ€‹

with open('header_only.csv', 'w') as f:
f.write("Name,Age,City\n")

with open('header_only.csv', 'r') as f:
count = sum(1 for line in f)

print(f"Total lines: {count}")
print(f"Data rows: {count - 1}")

Output:

Total lines: 1
Data rows: 0

Quick Comparison of Methodsโ€‹

MethodMemory EfficientHandles Multi-line FieldsSpeedCross-Platform
sum(1 for line in f)โœ… ExcellentโŒโšก Fastโœ…
csv.reader + sum()โœ… Excellentโœ…โšก Fastโœ…
readlines() + len()โŒ Loads entire fileโŒFastโœ…
len(pd.read_csv())โŒ Loads entire fileโœ…Moderateโœ…
pd.read_csv(chunksize=...)โœ… Goodโœ…Moderateโœ…
wc -l via subprocessโœ… ExcellentโŒโšกโšก FastestโŒ Unix only

Conclusionโ€‹

Python offers several methods to count lines in a CSV file, each suited to different situations:

  • sum(1 for line in f) is the fastest pure-Python method and handles files of any size with minimal memory. Ideal for most use cases.
  • csv.reader is the safest choice when your CSV may contain multi-line quoted fields, as it counts logical rows correctly.
  • len(pd.read_csv()) is the simplest option when you're already working with pandas, but loads the entire file into memory.
  • pd.read_csv(chunksize=...) balances pandas convenience with memory efficiency for large files.
  • wc -l via subprocess is the fastest option on Unix systems but isn't portable.

For most applications, the simple generator-based approach (sum(1 for line in f)) offers the best combination of speed, memory efficiency, and simplicity.