How to Convert a NumPy Array to a Pandas DataFrame in Python
Converting NumPy arrays to Pandas DataFrames adds labeled columns and indices, transforming raw numerical data into a structured format ideal for analysis, visualization, and export. This conversion is one of the most common operations when bridging numerical computation in NumPy with the data analysis capabilities of Pandas.
In this guide, you will learn how to perform this conversion for different array shapes, add meaningful labels, handle data types, and apply the technique to practical scenarios.
Basic Conversion
Pass the array directly to pd.DataFrame() along with column names:
import numpy as np
import pandas as pd
arr = np.array([[10, 20], [30, 40], [50, 60]])
df = pd.DataFrame(arr, columns=['A', 'B'])
print(df)
Output:
A B
0 10 20
1 30 40
2 50 60
Each column in the array maps to the corresponding name in the columns list, and Pandas automatically assigns integer indices starting from 0.
Adding Custom Row Labels
Use the index parameter to assign meaningful row identifiers instead of default integers:
import numpy as np
import pandas as pd
arr = np.array([[10, 20], [30, 40], [50, 60]])
df = pd.DataFrame(
arr,
columns=['Sales', 'Profit'],
index=['Q1', 'Q2', 'Q3']
)
print(df)
Output:
Sales Profit
Q1 10 20
Q2 30 40
Q3 50 60
Converting 1D Arrays
A 1D array creates a single-column DataFrame. You must still pass the column name as a list:
import numpy as np
import pandas as pd
arr_1d = np.array([100, 200, 300, 400])
df = pd.DataFrame(arr_1d, columns=['Value'])
print(df)
Output:
Value
0 100
1 200
2 300
3 400
Combining Multiple 1D Arrays as Columns
When you have several separate 1D arrays that represent different features, pass them as a dictionary to create a multi-column DataFrame:
import numpy as np
import pandas as pd
names = np.array(['Alice', 'Bob', 'Charlie'])
ages = np.array([25, 30, 35])
scores = np.array([85.5, 92.0, 78.5])
df = pd.DataFrame({
'Name': names,
'Age': ages,
'Score': scores
})
print(df)
Output:
Name Age Score
0 Alice 25 85.5
1 Bob 30 92.0
2 Charlie 35 78.5
This dictionary approach is especially useful when each array comes from a different source or computation.
Common Mistake: Column Count Mismatch
The number of column names must exactly match the number of columns in the array:
import numpy as np
import pandas as pd
arr = np.random.rand(5, 2) # 5 rows, 2 columns
# Correct: 2 columns, 2 names
df = pd.DataFrame(arr, columns=['X', 'Y'])
print(f"Shape: {df.shape}")
# Wrong: 3 names for 2 columns
try:
df = pd.DataFrame(arr, columns=['X', 'Y', 'Z'])
except ValueError as e:
print(f"Error: {e}")
Output:
Shape: (5, 2)
Error: Shape of passed values is (5, 2), indices imply (5, 3)
Always verify your array's shape with arr.shape before specifying column names. This is one of the most frequent errors when converting arrays to DataFrames, especially when the array is generated dynamically.
Specifying Data Types
Control column data types using the dtype parameter or the astype() method:
import numpy as np
import pandas as pd
arr = np.array([[1, 2], [3, 4], [5, 6]])
# Force all columns to float during creation
df_float = pd.DataFrame(arr, columns=['A', 'B'], dtype=float)
print(df_float.dtypes)
print()
# Or set types per column after creation
df = pd.DataFrame(arr, columns=['A', 'B'])
df = df.astype({'A': 'int32', 'B': 'float64'})
print(df.dtypes)
Output:
A float64
B float64
dtype: object
A int32
B float64
dtype: object
The dtype parameter applies the same type to all columns, while astype() gives you per-column control.
Handling Different Array Shapes
2D Arrays (Most Common)
import numpy as np
import pandas as pd
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
df = pd.DataFrame(matrix, columns=['Col1', 'Col2', 'Col3'])
print(df)
Output:
Col1 Col2 Col3
0 1 2 3
1 4 5 6
2 7 8 9
3D Arrays (Requires Reshaping)
DataFrames are inherently 2D, so 3D arrays must be reshaped before conversion:
import numpy as np
import pandas as pd
# 3D array: 2 layers, 3 rows, 4 columns
np.random.seed(42)
tensor = np.random.randint(0, 100, size=(2, 3, 4))
# Option 1: Flatten all layers into rows
flat = tensor.reshape(-1, tensor.shape[-1]) # (6, 4)
df = pd.DataFrame(flat, columns=['A', 'B', 'C', 'D'])
print(f"Flattened shape: {df.shape}")
print(df)
print()
# Option 2: Select a single layer
df_layer0 = pd.DataFrame(tensor[0], columns=['A', 'B', 'C', 'D'])
print(f"Single layer shape: {df_layer0.shape}")
print(df_layer0)
Output:
Flattened shape: (6, 4)
A B C D
0 51 92 14 71
1 60 20 82 86
2 74 74 87 99
3 23 2 21 52
4 1 87 29 37
5 1 63 59 20
Single layer shape: (3, 4)
A B C D
0 51 92 14 71
1 60 20 82 86
2 74 74 87 99
Automatic Column Names
If you omit the columns parameter, Pandas assigns integer column indices. You can rename them later:
import numpy as np
import pandas as pd
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Default integer column names
df = pd.DataFrame(arr)
print(df)
print()
# Rename columns after creation
df.columns = ['A', 'B', 'C']
print(df)
Output:
0 1 2
0 1 2 3
1 4 5 6
A B C
0 1 2 3
1 4 5 6
Practical Examples
Creating Test Data with Random Arrays
import numpy as np
import pandas as pd
np.random.seed(42)
data = np.random.randn(100, 4)
df = pd.DataFrame(
data,
columns=['Feature1', 'Feature2', 'Feature3', 'Target']
)
print(df.head())
print(f"\nShape: {df.shape}")
print(f"\nSummary Statistics:\n{df.describe().round(2)}")
Output:
Feature1 Feature2 Feature3 Target
0 0.496714 -0.138264 0.647689 1.523030
1 -0.234153 -0.234137 1.579213 0.767435
2 -0.469474 0.542560 -0.463418 -0.465730
3 0.241962 -1.913280 -1.724918 -0.562288
4 -1.012831 0.314247 -0.908024 -1.412304
Shape: (100, 4)
Summary Statistics:
Feature1 Feature2 Feature3 Target
count 100.00 100.00 100.00 100.00
mean -0.01 0.03 0.02 0.04
std 0.87 0.95 1.04 0.98
min -2.03 -1.96 -3.24 -1.99
25% -0.72 -0.56 -0.62 -0.73
50% -0.00 -0.02 0.07 0.08
75% 0.53 0.55 0.70 0.78
max 2.31 3.85 2.19 2.72
Converting Machine Learning Predictions
import numpy as np
import pandas as pd
sample_ids = np.arange(1, 6)
predictions = np.array([0.92, 0.15, 0.78, 0.34, 0.88])
actual = np.array([1, 0, 1, 0, 1])
results = pd.DataFrame({
'SampleID': sample_ids,
'Prediction': predictions,
'Actual': actual,
'Correct': (predictions > 0.5).astype(int) == actual
})
print(results)
print(f"\nAccuracy: {results['Correct'].mean():.0%}")
Output:
SampleID Prediction Actual Correct
0 1 0.92 1 True
1 2 0.15 0 True
2 3 0.78 1 True
3 4 0.34 0 True
4 5 0.88 1 True
Accuracy: 100%
Time Series Data with Date Index
import numpy as np
import pandas as pd
dates = pd.date_range('2024-01-01', periods=5, freq='D')
values = np.array([[100, 50], [105, 52], [103, 48], [110, 55], [108, 53]])
df = pd.DataFrame(
values,
columns=['Price', 'Volume'],
index=dates
)
print(df)
Output:
Price Volume
2024-01-01 100 50
2024-01-02 105 52
2024-01-03 103 48
2024-01-04 110 55
2024-01-05 108 53
Using a DatetimeIndex enables Pandas time series features like resampling, rolling windows, and date-based slicing.
Parameter Reference
| Parameter | Purpose | Example |
|---|---|---|
data | The NumPy array to convert | np.array([[1, 2], [3, 4]]) |
columns | Column header names | ['X', 'Y'] |
index | Row labels | ['Row1', 'Row2'] |
dtype | Force a single data type for all columns | float, 'int32' |
copy | Whether to copy the data (default True) | False for memory efficiency |
copy ParameterSetting copy=False can save memory by allowing the DataFrame to share the underlying data with the original NumPy array. However, modifying one will affect the other. Use copy=True (the default) when you need independent data, and copy=False only when you understand the implications and want to avoid duplicating large arrays.
Conclusion
Converting NumPy arrays to Pandas DataFrames is straightforward with pd.DataFrame(array, columns=[...]). Add the index parameter for custom row labels and dtype to control data types. For 1D arrays, pass them as a single-column DataFrame or combine multiple arrays using a dictionary. For 3D arrays, reshape to 2D first since DataFrames are inherently two-dimensional structures.
This conversion bridges the gap between NumPy's efficient numerical computation and Pandas' rich data analysis capabilities, enabling labeled access, grouping, filtering, visualization, and export operations on your array data.