Python Pandas: How to Shuffle DataFrame Rows in Pandas
Shuffling the rows of a DataFrame is a common operation in data science and machine learning workflows - especially when preparing data for model training, randomizing survey results, or breaking any inherent ordering in a dataset. Pandas and NumPy provide several ways to randomly reorder rows, each with different trade-offs in simplicity, performance, and flexibility.
This guide covers four methods with clear examples, outputs, and explanations.
Sample DataFrame
All examples in this guide use the following DataFrame:
import pandas as pd
data = {'A': range(1, 11), 'B': list('abcdefghij')}
df = pd.DataFrame(data)
print(df)
Output:
A B
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
8 9 i
9 10 j
Using sample(frac=1) - The Recommended Approach
The simplest and most idiomatic way to shuffle rows in Pandas is the sample() method. Setting frac=1 tells Pandas to return 100% of the rows in a random order:
import pandas as pd
data = {'A': range(1, 11), 'B': list('abcdefghij')}
df = pd.DataFrame(data)
shuffled = df.sample(frac=1).reset_index(drop=True)
print(shuffled)
Output:
A B
0 3 c
1 6 f
2 1 a
3 8 h
4 5 e
5 7 g
6 4 d
7 9 i
8 10 j
9 2 b
sample(frac=1)randomly selects all rows, effectively shuffling them.reset_index(drop=True)replaces the old index with a clean 0-based sequence. Without it, the original index values are preserved (e.g., row 8 would still have index7).
For reproducible shuffles (useful in testing and machine learning), pass a random_state parameter:
shuffled = df.sample(frac=1, random_state=42).reset_index(drop=True)
Using the same random_state value always produces the same shuffled order.
Why Reset the Index?
If you skip reset_index(drop=True), the shuffled DataFrame retains the original index values:
import pandas as pd
data = {'A': range(1, 11), 'B': list('abcdefghij')}
df = pd.DataFrame(data)
shuffled_no_reset = df.sample(frac=1)
print(shuffled_no_reset)
Output:
A B
6 7 g
4 5 e
5 6 f
0 1 a
8 9 i
3 4 d
9 10 j
2 3 c
1 2 b
7 8 h
Notice the index column is out of order (7, 3, 6, 0, ...). This can cause confusion when iterating or slicing by position later. Always reset the index after shuffling unless you specifically need to track the original row positions.
Using numpy.random.permutation
This method generates a permuted array of row indices using NumPy and then reorders the DataFrame with iloc:
import numpy as np
import pandas as pd
data = {'A': range(1, 11), 'B': list('abcdefghij')}
df = pd.DataFrame(data)
permuted_indices = np.random.permutation(len(df))
shuffled = df.iloc[permuted_indices].reset_index(drop=True)
print(shuffled)
Output:
A B
0 10 j
1 4 d
2 8 h
3 9 i
4 7 g
5 1 a
6 6 f
7 5 e
8 3 c
9 2 b
np.random.permutation(len(df)) returns an array like [0, 8, 6, 2, 9, 3, 7, 5, 4, 1] (a random reordering of indices 0–9). The iloc accessor then selects rows in that order.
This approach is useful when you are already working with NumPy or need the permuted indices for other purposes (e.g., shuffling a separate array in the same order).
Using numpy.random.shuffle with Index
This method extracts the DataFrame's index as a list, shuffles it in place, and then uses loc to reorder the rows:
import numpy as np
import pandas as pd
data = {'A': range(1, 11), 'B': list('abcdefghij')}
df = pd.DataFrame(data)
idx = df.index.to_list()
np.random.shuffle(idx)
shuffled = df.loc[idx].reset_index(drop=True)
print(shuffled)
Output:
A B
0 8 h
1 5 e
2 10 j
3 7 g
4 9 i
5 4 d
6 2 b
7 1 a
8 6 f
9 3 c
The key difference from the previous method is that np.random.shuffle() modifies the list in place and returns None, so you cannot chain it into a single expression.
A common mistake is trying to use the return value of np.random.shuffle():
# WRONG: np.random.shuffle returns None, not the shuffled array
shuffled = df.loc[np.random.shuffle(df.index.to_list())]
# TypeError: 'NoneType' object is not iterable
The correct approach is to shuffle the list first, then pass it to loc:
import numpy as np
import pandas as pd
data = {'A': range(1, 11), 'B': list('abcdefghij')}
df = pd.DataFrame(data)
idx = df.index.to_list()
np.random.shuffle(idx) # Modifies idx in place
shuffled = df.loc[idx].reset_index(drop=True)
print(shuffled)
Output:
A B
0 10 j
1 1 a
2 6 f
3 4 d
4 3 c
5 8 h
6 5 e
7 2 b
8 9 i
9 7 g
Using sort_values with a Random Column
This approach assigns a column of random numbers to the DataFrame, sorts by that column, and then drops it. While less efficient due to the sorting step, it can be useful when you want to apply weighted or semi-random ordering:
import numpy as np
import pandas as pd
data = {'A': range(1, 11), 'B': list('abcdefghij')}
df = pd.DataFrame(data)
shuffled = (
df.assign(rand_key=np.random.rand(len(df)))
.sort_values('rand_key')
.drop('rand_key', axis=1)
.reset_index(drop=True)
)
print(shuffled)
Output:
A B
0 5 e
1 7 g
2 4 d
3 1 a
4 8 h
5 6 f
6 9 i
7 3 c
8 2 b
9 10 j
assign(rand_key=np.random.rand(len(df)))adds a temporary column of random floats between 0 and 1.sort_values('rand_key')sorts the DataFrame by those random values, effectively shuffling it.drop('rand_key', axis=1)removes the temporary column.
This method has O(n log n) complexity due to sorting, compared to O(n) for the sample() approach. Use it only when you have a specific reason to sort rather than sample.
Method Comparison
| Method | Complexity | Simplicity | Reproducible | Best For |
|---|---|---|---|---|
df.sample(frac=1) | O(n) | ★★★★★ | Yes (random_state) | General use - recommended |
np.random.permutation | O(n) | ★★★★ | Yes (np.random.seed) | When you need the index array |
np.random.shuffle | O(n) | ★★★ | Yes (np.random.seed) | In-place index shuffling |
sort_values with random column | O(n log n) | ★★★ | Yes (np.random.seed) | Weighted/semi-random ordering |
For most use cases, df.sample(frac=1) is the best choice. It is concise, fast, built into Pandas, and supports reproducibility through the random_state parameter. Reserve the other methods for scenarios where you need more control over the shuffling mechanism.