How to Resolve "ValueError: All arrays must be of the same length" in Python

When creating a Pandas DataFrame from lists or arrays, you may encounter the error ValueError: All arrays must be of the same length. This happens when the columns you're trying to combine have different numbers of elements. Pandas requires all columns to have the same length because each column represents a field in a table, and every row must have a value (or NaN) for each field.

In this guide, we'll explain why this error occurs and show multiple ways to fix it depending on your data and use case.

Why Does This Error Occur?

A DataFrame is a tabular structure where every column must have the same number of rows. When you pass lists of unequal lengths as columns, Pandas can't construct the table and raises an error.

❌ Example that triggers the error:

import pandas as pd

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]  # Only 3 elements: doesn't match 4 names

df = pd.DataFrame({
    "name": names,
    "score": scores
})

Output:

ValueError: All arrays must be of the same length

The names list has 4 elements but scores has only 3. Pandas doesn't know what value to assign for Dave's score, so it raises an error.

Solutions

Solution 1: Fix the Source Data

The best fix is to ensure your data is complete. Check why the lists have different lengths and add the missing values:

import pandas as pd

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78, 88]  # Added Dave's score

df = pd.DataFrame({
    "name": names,
    "score": scores
})
print(df)

Output:

    name  score
Alice     85
  Bob     92
Carol     78
 Dave     88

To diagnose the mismatch, check the lengths before creating the DataFrame:

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]

columns = {"name": names, "score": scores}
for col_name, col_data in columns.items():
    print(f"{col_name}: {len(col_data)} elements")

Output:

name: 4 elements
score: 3 elements

Solution 2: Pad Shorter Lists with `NaN`

If missing values are expected (incomplete data), pad the shorter lists with NaN to match the longest list:

import pandas as pd
import numpy as np

names = ["Alice", "Bob", "Carol", "Dave", "Eve"]
scores = [85, 92, 78]
grades = ["A", "A+"]

# Find the maximum length
data = {"name": names, "score": scores, "grade": grades}
max_len = max(len(v) for v in data.values())

# Pad each list with NaN to match the longest
for key in data:
    data[key] += [np.nan] * (max_len - len(data[key]))

df = pd.DataFrame(data)
print(df)

Output:

    name  score grade
Alice   85.0     A
  Bob   92.0    A+
Carol   78.0   NaN
 Dave    NaN   NaN
  Eve    NaN   NaN

You can wrap this into a reusable function:

import pandas as pd
import numpy as np

def create_df_from_uneven_lists(data_dict):
    """Create a DataFrame from lists of different lengths, padding with NaN."""
    max_len = max(len(v) for v in data_dict.values())
    padded = {
        key: values + [np.nan] * (max_len - len(values))
        for key, values in data_dict.items()
    }
    return pd.DataFrame(padded)

# Usage
df = create_df_from_uneven_lists({
    "city": ["NYC", "London", "Tokyo", "Paris"],
    "population_m": [8.3, 8.9],
    "country": ["USA", "UK", "Japan"]
})
print(df)

Output:

     city  population_m country
   NYC           8.3     USA
London           8.9      UK
 Tokyo           NaN   Japan
 Paris           NaN     NaN

Solution 3: Truncate Longer Lists

If the extra elements in longer lists aren't needed, truncate all lists to the length of the shortest one:

import pandas as pd

names = ["Alice", "Bob", "Carol", "Dave", "Eve"]
scores = [85, 92, 78]

# Truncate to the shortest length
min_len = min(len(names), len(scores))

df = pd.DataFrame({
    "name": names[:min_len],
    "score": scores[:min_len]
})
print(df)

Output:

    name  score
Alice     85
  Bob     92
Carol     78

warning

Truncating discards data. Only use this approach when you're certain the extra elements aren't important.

Solution 4: Use `pd.Series` for Each Column

Creating a DataFrame from individual pd.Series objects automatically handles length mismatches by aligning on the index and filling gaps with NaN:

import pandas as pd

df = pd.DataFrame({
    "name": pd.Series(["Alice", "Bob", "Carol", "Dave"]),
    "score": pd.Series([85, 92, 78]),
    "grade": pd.Series(["A", "A+"])
})
print(df)

Output:

    name  score grade
Alice   85.0     A
  Bob   92.0    A+
Carol   78.0   NaN
 Dave    NaN   NaN

tip

This is the simplest one-line fix: just wrap each list in pd.Series(). Pandas handles the alignment automatically.

Solution 5: Use `dict.fromkeys()` with `zip_longest`

Python's itertools.zip_longest pairs elements from multiple iterables and fills missing values with a specified default:

import pandas as pd
from itertools import zip_longest

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]
grades = ["A", "A+"]

# Combine with NaN for missing values
rows = list(zip_longest(names, scores, grades, fillvalue=None))
df = pd.DataFrame(rows, columns=["name", "score", "grade"])
print(df)

Output:

    name  score grade
Alice   85.0     A
  Bob   92.0    A+
Carol   78.0  None
 Dave    NaN  None

Solution 6: Fill Missing Values with a Default

Instead of NaN, you might want to fill missing values with a meaningful default like 0, the column mean, or a placeholder string:

import pandas as pd
import numpy as np

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]

# Pad with the mean of existing scores
mean_score = sum(scores) / len(scores)
scores += [mean_score] * (len(names) - len(scores))

df = pd.DataFrame({
    "name": names,
    "score": scores
})
print(df)

Output:

    name  score
Alice   85.0
  Bob   92.0
Carol   78.0
 Dave   85.0

Or fill with zero:

import pandas as pd
import numpy as np

names = ["Alice", "Bob", "Carol", "Dave"]
scores = [85, 92, 78]

# Pad with zero
scores += [0] * (len(names) - len(scores))

df = pd.DataFrame({
    "name": names,
    "score": scores
})
print(df)

Output:

    name  score
Alice     85
  Bob     92
Carol     78
 Dave      0

Choosing the Right Approach

Situation	Best Solution
Data is incomplete due to a bug	Fix the source data: add missing values
Missing values are expected	Pad with `NaN` or use `pd.Series()`
Extra values in some columns aren't needed	Truncate to the shortest length
Want a quick, clean one-liner	Wrap each list in `pd.Series()`
Need a specific default value	Pad with `0`, mean, or a placeholder
Building rows from multiple lists	Use `zip_longest`

Prevention: Validate Before Creating

Add a quick validation step before creating DataFrames to catch mismatches early:

import pandas as pd

def safe_create_dataframe(data_dict):
    """Create a DataFrame with validation for column lengths."""
    lengths = {col: len(values) for col, values in data_dict.items()}
    unique_lengths = set(lengths.values())

    if len(unique_lengths) > 1:
        print("⚠️ Column length mismatch detected:")
        for col, length in lengths.items():
            print(f"  {col}: {length} elements")

        # Auto-fix by padding with NaN
        max_len = max(lengths.values())
        import numpy as np
        for col in data_dict:
            data_dict[col] += [np.nan] * (max_len - len(data_dict[col]))
        print(f"  → Padded all columns to {max_len} elements.\n")

    return pd.DataFrame(data_dict)


# Usage
df = safe_create_dataframe({
    "name": ["Alice", "Bob", "Carol"],
    "score": [85, 92],
    "grade": ["A"]
})
print(df)

Output:

⚠️ Column length mismatch detected:
  name: 3 elements
  score: 2 elements
  grade: 1 elements
  → Padded all columns to 3 elements.

    name  score grade
0  Alice   85.0     A
1    Bob   92.0   NaN
2  Carol    NaN   NaN

Conclusion

The ValueError: All arrays must be of the same length error occurs because Pandas requires all columns in a DataFrame to have the same number of rows.

The simplest fix is to wrap each list in pd.Series(), which automatically handles length mismatches by filling gaps with NaN.
For more control, you can pad shorter lists with NaN or a default value, truncate longer lists to match the shortest, or use itertools.zip_longest to combine rows. In all cases, start by checking the lengths of your data to identify which columns are mismatched.

Why Does This Error Occur?​

Solutions​

Solution 1: Fix the Source Data​

Solution 2: Pad Shorter Lists with NaN​

Solution 3: Truncate Longer Lists​

Solution 4: Use pd.Series for Each Column​

Solution 5: Use dict.fromkeys() with zip_longest​

Solution 6: Fill Missing Values with a Default​

Choosing the Right Approach​

Prevention: Validate Before Creating​

Conclusion​

Table of Contents

Why Does This Error Occur?

Solutions

Solution 1: Fix the Source Data

Solution 2: Pad Shorter Lists with `NaN`

Solution 3: Truncate Longer Lists

Solution 4: Use `pd.Series` for Each Column

Solution 5: Use `dict.fromkeys()` with `zip_longest`

Solution 6: Fill Missing Values with a Default

Choosing the Right Approach

Prevention: Validate Before Creating

Conclusion