How to Resolve "ValueError: All arrays must be of the same length" in Python
When creating a Pandas DataFrame from lists or arrays, you may encounter the error ValueError: All arrays must be of the same length. This happens when the columns you're trying to combine have different numbers of elements. Pandas requires all columns to have the same length because each column represents a field in a table, and every row must have a value (or NaN) for each field.
In this guide, we'll explain why this error occurs and show multiple ways to fix it depending on your data and use case.
Why Does This Error Occur?
A DataFrame is a tabular structure where every column must have the same number of rows. When you pass lists of unequal lengths as columns, Pandas can't construct the table and raises an error.
❌ Example that triggers the error:
import pandas as pd
names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78] # Only 3 elements: doesn't match 4 names
df = pd.DataFrame({
"name": names,
"score": scores
})
Output:
ValueError: All arrays must be of the same length
The names list has 4 elements but scores has only 3. Pandas doesn't know what value to assign for Dave's score, so it raises an error.
Solutions
Solution 1: Fix the Source Data
The best fix is to ensure your data is complete. Check why the lists have different lengths and add the missing values:
import pandas as pd
names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78, 88] # Added Dave's score
df = pd.DataFrame({
"name": names,
"score": scores
})
print(df)
Output:
name score
0 Alice 85
1 Bob 92
2 Carol 78
3 Dave 88
To diagnose the mismatch, check the lengths before creating the DataFrame:
names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]
columns = {"name": names, "score": scores}
for col_name, col_data in columns.items():
print(f"{col_name}: {len(col_data)} elements")
Output:
name: 4 elements
score: 3 elements
Solution 2: Pad Shorter Lists with NaN
If missing values are expected (incomplete data), pad the shorter lists with NaN to match the longest list:
import pandas as pd
import numpy as np
names = ["Alice", "Bob", "Carol", "Dave", "Eve"]
scores = [85, 92, 78]
grades = ["A", "A+"]
# Find the maximum length
data = {"name": names, "score": scores, "grade": grades}
max_len = max(len(v) for v in data.values())
# Pad each list with NaN to match the longest
for key in data:
data[key] += [np.nan] * (max_len - len(data[key]))
df = pd.DataFrame(data)
print(df)
Output:
name score grade
0 Alice 85.0 A
1 Bob 92.0 A+
2 Carol 78.0 NaN
3 Dave NaN NaN
4 Eve NaN NaN
You can wrap this into a reusable function:
import pandas as pd
import numpy as np
def create_df_from_uneven_lists(data_dict):
"""Create a DataFrame from lists of different lengths, padding with NaN."""
max_len = max(len(v) for v in data_dict.values())
padded = {
key: values + [np.nan] * (max_len - len(values))
for key, values in data_dict.items()
}
return pd.DataFrame(padded)
# Usage
df = create_df_from_uneven_lists({
"city": ["NYC", "London", "Tokyo", "Paris"],
"population_m": [8.3, 8.9],
"country": ["USA", "UK", "Japan"]
})
print(df)
Output:
city population_m country
0 NYC 8.3 USA
1 London 8.9 UK
2 Tokyo NaN Japan
3 Paris NaN NaN
Solution 3: Truncate Longer Lists
If the extra elements in longer lists aren't needed, truncate all lists to the length of the shortest one:
import pandas as pd
names = ["Alice", "Bob", "Carol", "Dave", "Eve"]
scores = [85, 92, 78]
# Truncate to the shortest length
min_len = min(len(names), len(scores))
df = pd.DataFrame({
"name": names[:min_len],
"score": scores[:min_len]
})
print(df)
Output:
name score
0 Alice 85
1 Bob 92
2 Carol 78
Truncating discards data. Only use this approach when you're certain the extra elements aren't important.
Solution 4: Use pd.Series for Each Column
Creating a DataFrame from individual pd.Series objects automatically handles length mismatches by aligning on the index and filling gaps with NaN:
import pandas as pd
df = pd.DataFrame({
"name": pd.Series(["Alice", "Bob", "Carol", "Dave"]),
"score": pd.Series([85, 92, 78]),
"grade": pd.Series(["A", "A+"])
})
print(df)
Output:
name score grade
0 Alice 85.0 A
1 Bob 92.0 A+
2 Carol 78.0 NaN
3 Dave NaN NaN
This is the simplest one-line fix: just wrap each list in pd.Series(). Pandas handles the alignment automatically.
Solution 5: Use dict.fromkeys() with zip_longest
Python's itertools.zip_longest pairs elements from multiple iterables and fills missing values with a specified default:
import pandas as pd
from itertools import zip_longest
names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]
grades = ["A", "A+"]
# Combine with NaN for missing values
rows = list(zip_longest(names, scores, grades, fillvalue=None))
df = pd.DataFrame(rows, columns=["name", "score", "grade"])
print(df)
Output:
name score grade
0 Alice 85.0 A
1 Bob 92.0 A+
2 Carol 78.0 None
3 Dave NaN None
Solution 6: Fill Missing Values with a Default
Instead of NaN, you might want to fill missing values with a meaningful default like 0, the column mean, or a placeholder string:
import pandas as pd
import numpy as np
names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]
# Pad with the mean of existing scores
mean_score = sum(scores) / len(scores)
scores += [mean_score] * (len(names) - len(scores))
df = pd.DataFrame({
"name": names,
"score": scores
})
print(df)
Output:
name score
0 Alice 85.0
1 Bob 92.0
2 Carol 78.0
3 Dave 85.0
Or fill with zero:
import pandas as pd
import numpy as np
names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]
# Pad with zero
scores += [0] * (len(names) - len(scores))
df = pd.DataFrame({
"name": names,
"score": scores
})
print(df)
Output:
name score
0 Alice 85
1 Bob 92
2 Carol 78
3 Dave 0
Choosing the Right Approach
| Situation | Best Solution |
|---|---|
| Data is incomplete due to a bug | Fix the source data: add missing values |
| Missing values are expected | Pad with NaN or use pd.Series() |
| Extra values in some columns aren't needed | Truncate to the shortest length |
| Want a quick, clean one-liner | Wrap each list in pd.Series() |
| Need a specific default value | Pad with 0, mean, or a placeholder |
| Building rows from multiple lists | Use zip_longest |
Prevention: Validate Before Creating
Add a quick validation step before creating DataFrames to catch mismatches early:
import pandas as pd
def safe_create_dataframe(data_dict):
"""Create a DataFrame with validation for column lengths."""
lengths = {col: len(values) for col, values in data_dict.items()}
unique_lengths = set(lengths.values())
if len(unique_lengths) > 1:
print("⚠️ Column length mismatch detected:")
for col, length in lengths.items():
print(f" {col}: {length} elements")
# Auto-fix by padding with NaN
max_len = max(lengths.values())
import numpy as np
for col in data_dict:
data_dict[col] += [np.nan] * (max_len - len(data_dict[col]))
print(f" → Padded all columns to {max_len} elements.\n")
return pd.DataFrame(data_dict)
# Usage
df = safe_create_dataframe({
"name": ["Alice", "Bob", "Carol"],
"score": [85, 92],
"grade": ["A"]
})
print(df)
Output:
⚠️ Column length mismatch detected:
name: 3 elements
score: 2 elements
grade: 1 elements
→ Padded all columns to 3 elements.
name score grade
0 Alice 85.0 A
1 Bob 92.0 NaN
2 Carol NaN NaN
Conclusion
The ValueError: All arrays must be of the same length error occurs because Pandas requires all columns in a DataFrame to have the same number of rows.
- The simplest fix is to wrap each list in
pd.Series(), which automatically handles length mismatches by filling gaps withNaN. - For more control, you can pad shorter lists with
NaNor a default value, truncate longer lists to match the shortest, or useitertools.zip_longestto combine rows. In all cases, start by checking the lengths of your data to identify which columns are mismatched.