How to Resolve "ValueError: Columns must be same length as key" in Pandas
The ValueError: Columns must be same length as key error in Pandas occurs when attempting to assign values to new columns in a DataFrame, but the number of values being assigned doesn't match the number of rows in the DataFrame, or when using str.split() and the resulting number of columns does not match the number of columns being assigned to.
This guide explores common causes and solutions for this error, ensuring your data assignments are accurate and error-free.
Understanding the Error
The error arises when you're creating new columns in a DataFrame by assigning a list or Series of values. Pandas expects the number of values to precisely match the number of rows in the DataFrame. If the numbers don't match, the ValueError: Columns must be same length as key will be raised.
import pandas as pd
df1 = pd.DataFrame({
'column1': ['Anna', 'Bob', 'Carl', 'Dan'],
'column2': [29, 30, 31, 32]
})
df2 = pd.DataFrame({
'column1': [100, 200, 300]
})
try:
df1[['column3', 'column4']] = df2['column1']
except ValueError as e: # Columns must be same length as key!
print(e) # Must have equal len keys and value when setting with an iterable
df1[['column3', 'column4']] = df2['column1']will raise an exception because the dataframedf1has 4 rows, while thecolumn1series ondf2has only 3 elements, and the number of columns on the left-hand side of the equal sign is2.
Solutions
To resolve the error, ensure the number of values being assigned matches the number of rows in the DataFrame, or make sure that the number of columns that you assign to is correct.
Ensure Value Length Matches Row Length
The most common fix is to either add more values so that the number of values will match the row number, or assign to only one row:
import pandas as pd
df1 = pd.DataFrame({
'column1': ['Anna', 'Bob', 'Carl', 'Dan'],
'column2': [29, 30, 31, 32]
})
df2 = pd.DataFrame({
'column1': [100, 200, 300, 400] # Add a value for each row
})
df1[['column3']] = df2['column1'] # and also correct the number of columns being assigned to
print(df1)
Output:
column1 column2 column3
0 Anna 29 100
1 Bob 30 200
2 Carl 31 300
3 Dan 32 400