Python Pandas: How to Skip Rows While Reading a CSV File Using Pandas
When reading CSV files, you often encounter rows that need to be excluded - header comments, metadata lines, blank rows, or data that doesn't meet certain criteria. The Pandas read_csv() function provides flexible parameters to skip rows during file loading, eliminating the need to clean them afterward. This guide covers all the common row-skipping techniques with practical examples and outputs.
Key Parameters for Skipping Rows
The read_csv() function has two primary parameters for skipping rows:
| Parameter | Description | Accepts |
|---|---|---|
skiprows | Rows to skip from the top of the file | Integer, list of integers, or callable function |
skipfooter | Number of rows to skip from the bottom of the file | Integer (requires engine='python') |
Sample CSV File
All examples reference a file called students.csv with the following content:
Name,Age,City,Score
Alice,20,New York,88
Bob,22,Chicago,92
Charlie,21,Boston,78
Diana,23,Seattle,95
Eve,20,Austin,81
Frank,24,Denver,89
Grace,22,Miami,93
Loading it without skipping any rows:
import pandas as pd
df = pd.read_csv('students.csv')
print(df)
Output:
Name Age City Score
0 Alice 20 New York 88
1 Bob 22 Chicago 92
2 Charlie 21 Boston 78
3 Diana 23 Seattle 95
4 Eve 20 Austin 81
5 Frank 24 Denver 89
6 Grace 22 Miami 93
Method 1: Skip N Rows from the Top
Pass an integer to skiprows to skip that many rows from the beginning of the file. Note that this counts from row 0 (the header row), so skipping rows also removes the header:
import pandas as pd
# Skip the first 3 rows (including the header)
df = pd.read_csv('students.csv', skiprows=3)
print(df)
Output:
Charlie 21 Boston 78
0 Diana 23 Seattle 95
1 Eve 20 Austin 81
2 Frank 24 Denver 89
3 Grace 22 Miami 93
The original header (Name, Age, City, Score) and the first two data rows are skipped. The third data row (Charlie) is incorrectly used as the header.
When using skiprows with an integer, the header row (row 0) is also counted. If you want to keep the header and skip only data rows, use a list or range that excludes row 0 (see Method 3 below).
Method 2: Skip Rows at Specific Positions
Pass a list of row indices to skip specific rows. Row 0 is the header:
import pandas as pd
# Skip the header (row 0), row 2, and row 5
df = pd.read_csv('students.csv', skiprows=[2, 5])
print(df)
Output:
Name Age City Score
0 Alice 20 New York 88
1 Charlie 21 Boston 78
2 Diana 23 Seattle 95
3 Frank 24 Denver 89
4 Grace 22 Miami 93
Rows at positions 2 (Bob) and 5 (Eve) in the file are skipped. The header (row 0) is preserved because it is not in the skip list.
Method 3: Skip N Data Rows While Keeping the Header
To skip data rows but preserve the column names, create a list of row indices that starts at 1 (the first data row) instead of 0 (the header):
import pandas as pd
# Skip the first 2 data rows (rows 1 and 2), keep the header (row 0)
df = pd.read_csv('students.csv', skiprows=[1, 2])
print(df)
Output:
Name Age City Score
0 Charlie 21 Boston 78
1 Diana 23 Seattle 95
2 Eve 20 Austin 81
3 Frank 24 Denver 89
4 Grace 22 Miami 93
Using a range for more rows:
import pandas as pd
# Skip data rows 1 through 3, keep header
df = pd.read_csv('students.csv', skiprows=range(1, 4))
print(df)
Output:
Name Age City Score
0 Diana 23 Seattle 95
1 Eve 20 Austin 81
2 Frank 24 Denver 89
3 Grace 22 Miami 93
Use skiprows=range(1, N+1) to skip the first N data rows while keeping the header intact. This is the most common pattern for skipping rows in practice.
Method 4: Skip Rows Based on a Condition (Callable)
The skiprows parameter also accepts a callable function that receives the row index and returns True to skip or False to keep. This enables conditional row skipping:
Skip Every 3rd Row
import pandas as pd
# Skip every row where the index is divisible by 3
df = pd.read_csv('students.csv', skiprows=lambda x: x % 3 == 0 and x != 0)
print(df)
Output:
Name Age City Score
0 Alice 20 New York 88
1 Bob 22 Chicago 92
2 Diana 23 Seattle 95
3 Eve 20 Austin 81
4 Grace 22 Miami 93
Rows at positions 3 (Charlie) and 6 (Frank) are skipped. The condition x != 0 preserves the header.
Skip Rows Based on Content (Two-Pass Approach)
The callable only receives the row index, not the row content. To skip rows based on their values, you need a two-pass approach:
import pandas as pd
# First pass: load everything
df = pd.read_csv('students.csv')
# Second pass: filter out rows where Score < 85
df = df[df['Score'] >= 85]
print(df.reset_index(drop=True))
Output:
Name Age City Score
0 Alice 20 New York 88
1 Bob 22 Chicago 92
2 Diana 23 Seattle 95
3 Frank 24 Denver 89
4 Grace 22 Miami 93
Method 5: Skip Rows from the End of the File
Use skipfooter to skip a specified number of rows from the bottom of the file. This requires setting engine='python':
import pandas as pd
# Skip the last 3 rows
df = pd.read_csv('students.csv', skipfooter=3, engine='python')
print(df)
Output:
Name Age City Score
0 Alice 20 New York 88
1 Bob 22 Chicago 92
2 Charlie 21 Boston 78
3 Diana 23 Seattle 95
The last three rows (Eve, Frank, Grace) are excluded.
skipfooter requires engine='python' because the default C engine does not support this parameter. The Python engine is slower for large files, so use skipfooter only when necessary.
Combining skiprows and nrows
You can combine skiprows with nrows to read a specific window of rows from the file:
import pandas as pd
# Skip the first 2 data rows, then read only the next 3 rows
df = pd.read_csv('students.csv', skiprows=range(1, 3), nrows=3)
print(df)
Output:
Name Age City Score
0 Charlie 21 Boston 78
1 Diana 23 Seattle 95
2 Eve 20 Austin 81
This is useful for reading specific sections of very large files without loading everything into memory.
Common Mistake: Accidentally Skipping the Header
A frequent error is using skiprows=1 intending to skip the first data row, but actually skipping the header:
import pandas as pd
# WRONG: skiprows=1 skips the header row (row 0)
df = pd.read_csv('students.csv', skiprows=1)
print(df)
Output:
Alice 20 New York 88
0 Bob 22 Chicago 92
1 Charlie 21 Boston 78
2 Diana 23 Seattle 95
3 Eve 20 Austin 81
4 Frank 24 Denver 89
5 Grace 22 Miami 93
...
The first data row (Alice) is now incorrectly used as the header.
The correct approach:
to skip the first data row while keeping the header:
import pandas as pd
# CORRECT: skip row 1 (first data row), keep row 0 (header)
df = pd.read_csv('students.csv', skiprows=[1])
print(df)
Output:
Name Age City Score
0 Bob 22 Chicago 92
1 Charlie 21 Boston 78
2 Diana 23 Seattle 95
3 Eve 20 Austin 81
4 Frank 24 Denver 89
5 Grace 22 Miami 93
When skiprows is an integer (e.g., skiprows=2), it skips the first 2 rows starting from row 0, which includes the header. When skiprows is a list (e.g., skiprows=[2]), it skips only the specific row at that position. Always use a list when you want to preserve the header.
Quick Reference
| Goal | Code |
|---|---|
| Skip first N rows (including header) | pd.read_csv('file.csv', skiprows=N) |
| Skip first N data rows (keep header) | pd.read_csv('file.csv', skiprows=range(1, N+1)) |
| Skip specific rows by position | pd.read_csv('file.csv', skiprows=[2, 5, 8]) |
| Skip rows based on a condition | pd.read_csv('file.csv', skiprows=lambda x: condition) |
| Skip last N rows | pd.read_csv('file.csv', skipfooter=N, engine='python') |
| Read a specific window of rows | pd.read_csv('file.csv', skiprows=range(1, 4), nrows=5) |
The skiprows and skipfooter parameters give you precise control over which rows are loaded from a CSV file. Whether you need to skip metadata headers, remove specific records, or apply conditional filtering during loading, these parameters help you load only the data you need without post-processing.