Skip to main content

How to Resolve "ValueError: All arrays must be of the same length" in Python

When creating a Pandas DataFrame from lists or arrays, you may encounter the error ValueError: All arrays must be of the same length. This happens when the columns you're trying to combine have different numbers of elements. Pandas requires all columns to have the same length because each column represents a field in a table, and every row must have a value (or NaN) for each field.

In this guide, we'll explain why this error occurs and show multiple ways to fix it depending on your data and use case.

Why Does This Error Occur?

A DataFrame is a tabular structure where every column must have the same number of rows. When you pass lists of unequal lengths as columns, Pandas can't construct the table and raises an error.

❌ Example that triggers the error:

import pandas as pd

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78] # Only 3 elements: doesn't match 4 names

df = pd.DataFrame({
"name": names,
"score": scores
})

Output:

ValueError: All arrays must be of the same length

The names list has 4 elements but scores has only 3. Pandas doesn't know what value to assign for Dave's score, so it raises an error.

Solutions

Solution 1: Fix the Source Data

The best fix is to ensure your data is complete. Check why the lists have different lengths and add the missing values:

import pandas as pd

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78, 88] # Added Dave's score

df = pd.DataFrame({
"name": names,
"score": scores
})
print(df)

Output:

    name  score
0 Alice 85
1 Bob 92
2 Carol 78
3 Dave 88

To diagnose the mismatch, check the lengths before creating the DataFrame:

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]

columns = {"name": names, "score": scores}
for col_name, col_data in columns.items():
print(f"{col_name}: {len(col_data)} elements")

Output:

name: 4 elements
score: 3 elements

Solution 2: Pad Shorter Lists with NaN

If missing values are expected (incomplete data), pad the shorter lists with NaN to match the longest list:

import pandas as pd
import numpy as np

names = ["Alice", "Bob", "Carol", "Dave", "Eve"]
scores = [85, 92, 78]
grades = ["A", "A+"]

# Find the maximum length
data = {"name": names, "score": scores, "grade": grades}
max_len = max(len(v) for v in data.values())

# Pad each list with NaN to match the longest
for key in data:
data[key] += [np.nan] * (max_len - len(data[key]))

df = pd.DataFrame(data)
print(df)

Output:

    name  score grade
0 Alice 85.0 A
1 Bob 92.0 A+
2 Carol 78.0 NaN
3 Dave NaN NaN
4 Eve NaN NaN

You can wrap this into a reusable function:

import pandas as pd
import numpy as np

def create_df_from_uneven_lists(data_dict):
"""Create a DataFrame from lists of different lengths, padding with NaN."""
max_len = max(len(v) for v in data_dict.values())
padded = {
key: values + [np.nan] * (max_len - len(values))
for key, values in data_dict.items()
}
return pd.DataFrame(padded)

# Usage
df = create_df_from_uneven_lists({
"city": ["NYC", "London", "Tokyo", "Paris"],
"population_m": [8.3, 8.9],
"country": ["USA", "UK", "Japan"]
})
print(df)

Output:

     city  population_m country
0 NYC 8.3 USA
1 London 8.9 UK
2 Tokyo NaN Japan
3 Paris NaN NaN

Solution 3: Truncate Longer Lists

If the extra elements in longer lists aren't needed, truncate all lists to the length of the shortest one:

import pandas as pd

names = ["Alice", "Bob", "Carol", "Dave", "Eve"]
scores = [85, 92, 78]

# Truncate to the shortest length
min_len = min(len(names), len(scores))

df = pd.DataFrame({
"name": names[:min_len],
"score": scores[:min_len]
})
print(df)

Output:

    name  score
0 Alice 85
1 Bob 92
2 Carol 78
caution

Truncating discards data. Only use this approach when you're certain the extra elements aren't important.

Solution 4: Use pd.Series for Each Column

Creating a DataFrame from individual pd.Series objects automatically handles length mismatches by aligning on the index and filling gaps with NaN:

import pandas as pd

df = pd.DataFrame({
"name": pd.Series(["Alice", "Bob", "Carol", "Dave"]),
"score": pd.Series([85, 92, 78]),
"grade": pd.Series(["A", "A+"])
})
print(df)

Output:

    name  score grade
0 Alice 85.0 A
1 Bob 92.0 A+
2 Carol 78.0 NaN
3 Dave NaN NaN
tip

This is the simplest one-line fix: just wrap each list in pd.Series(). Pandas handles the alignment automatically.

Solution 5: Use dict.fromkeys() with zip_longest

Python's itertools.zip_longest pairs elements from multiple iterables and fills missing values with a specified default:

import pandas as pd
from itertools import zip_longest

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]
grades = ["A", "A+"]

# Combine with NaN for missing values
rows = list(zip_longest(names, scores, grades, fillvalue=None))
df = pd.DataFrame(rows, columns=["name", "score", "grade"])
print(df)

Output:

    name  score grade
0 Alice 85.0 A
1 Bob 92.0 A+
2 Carol 78.0 None
3 Dave NaN None

Solution 6: Fill Missing Values with a Default

Instead of NaN, you might want to fill missing values with a meaningful default like 0, the column mean, or a placeholder string:

import pandas as pd
import numpy as np

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]

# Pad with the mean of existing scores
mean_score = sum(scores) / len(scores)
scores += [mean_score] * (len(names) - len(scores))

df = pd.DataFrame({
"name": names,
"score": scores
})
print(df)

Output:

    name  score
0 Alice 85.0
1 Bob 92.0
2 Carol 78.0
3 Dave 85.0

Or fill with zero:

import pandas as pd
import numpy as np

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]

# Pad with zero
scores += [0] * (len(names) - len(scores))

df = pd.DataFrame({
"name": names,
"score": scores
})
print(df)

Output:

    name  score
0 Alice 85
1 Bob 92
2 Carol 78
3 Dave 0

Choosing the Right Approach

SituationBest Solution
Data is incomplete due to a bugFix the source data: add missing values
Missing values are expectedPad with NaN or use pd.Series()
Extra values in some columns aren't neededTruncate to the shortest length
Want a quick, clean one-linerWrap each list in pd.Series()
Need a specific default valuePad with 0, mean, or a placeholder
Building rows from multiple listsUse zip_longest

Prevention: Validate Before Creating

Add a quick validation step before creating DataFrames to catch mismatches early:

import pandas as pd

def safe_create_dataframe(data_dict):
"""Create a DataFrame with validation for column lengths."""
lengths = {col: len(values) for col, values in data_dict.items()}
unique_lengths = set(lengths.values())

if len(unique_lengths) > 1:
print("⚠️ Column length mismatch detected:")
for col, length in lengths.items():
print(f" {col}: {length} elements")

# Auto-fix by padding with NaN
max_len = max(lengths.values())
import numpy as np
for col in data_dict:
data_dict[col] += [np.nan] * (max_len - len(data_dict[col]))
print(f" → Padded all columns to {max_len} elements.\n")

return pd.DataFrame(data_dict)


# Usage
df = safe_create_dataframe({
"name": ["Alice", "Bob", "Carol"],
"score": [85, 92],
"grade": ["A"]
})
print(df)

Output:

⚠️ Column length mismatch detected:
name: 3 elements
score: 2 elements
grade: 1 elements
→ Padded all columns to 3 elements.

name score grade
0 Alice 85.0 A
1 Bob 92.0 NaN
2 Carol NaN NaN

Conclusion

The ValueError: All arrays must be of the same length error occurs because Pandas requires all columns in a DataFrame to have the same number of rows.

  • The simplest fix is to wrap each list in pd.Series(), which automatically handles length mismatches by filling gaps with NaN.
  • For more control, you can pad shorter lists with NaN or a default value, truncate longer lists to match the shortest, or use itertools.zip_longest to combine rows. In all cases, start by checking the lengths of your data to identify which columns are mismatched.