How to Convert a NumPy Array to a Pandas DataFrame in Python
Converting NumPy arrays to Pandas DataFrames adds labeled columns and indices, transforming raw numerical data into a structured format ideal for analysis, visualization, and export.
Basic Conversion
Pass the array directly to pd.DataFrame() with column names:
import numpy as np
import pandas as pd
arr = np.array([[10, 20], [30, 40], [50, 60]])
df = pd.DataFrame(arr, columns=['A', 'B'])
print(df)
Output:
A B
0 10 20
1 30 40
2 50 60
Adding Custom Row Labels
Use the index parameter to assign meaningful row identifiers:
import numpy as np
import pandas as pd
arr = np.array([[10, 20], [30, 40], [50, 60]])
df = pd.DataFrame(
arr,
columns=['Sales', 'Profit'],
index=['Q1', 'Q2', 'Q3']
)
print(df)
Output:
Sales Profit
Q1 10 20
Q2 30 40
Q3 50 60
Converting 1D Arrays
A 1D array creates a single-column DataFrame:
import numpy as np
import pandas as pd
arr_1d = np.array([100, 200, 300, 400])
# Creates a single column
df = pd.DataFrame(arr_1d, columns=['Value'])
print(df)
Output:
Value
0 100
1 200
2 300
3 400
Multiple 1D Arrays as Columns
Combine several 1D arrays into separate columns:
import numpy as np
import pandas as pd
names = np.array(['Alice', 'Bob', 'Charlie'])
ages = np.array([25, 30, 35])
scores = np.array([85.5, 92.0, 78.5])
df = pd.DataFrame({
'Name': names,
'Age': ages,
'Score': scores
})
print(df)
Output:
Name Age Score
0 Alice 25 85.5
1 Bob 30 92.0
2 Charlie 35 78.5
Column Count Must Match
The number of column names must equal the number of array columns:
import numpy as np
import pandas as pd
arr = np.random.rand(5, 2) # 5 rows, 2 columns
# ✅ Correct: 2 columns, 2 names
df = pd.DataFrame(arr, columns=['X', 'Y'])
print(df.shape) # (5, 2)
# ❌ Error: 3 names for 2 columns
try:
df = pd.DataFrame(arr, columns=['X', 'Y', 'Z'])
except ValueError as e:
print(f"Error: {e}")
# Error: Shape of passed values is (5, 2), indices imply (5, 3)
Always verify your array's shape with arr.shape before specifying column names to avoid dimension mismatch errors.
Specifying Data Types
Control column types with the dtype parameter:
import numpy as np
import pandas as pd
arr = np.array([[1, 2], [3, 4], [5, 6]])
# Force all columns to float
df_float = pd.DataFrame(arr, columns=['A', 'B'], dtype=float)
print(df_float.dtypes)
# A float64
# B float64
# Or specify types per column after creation
df = pd.DataFrame(arr, columns=['A', 'B'])
df = df.astype({'A': 'int32', 'B': 'float64'})
print(df.dtypes)
# A int32
# B float64
Output:
A float64
B float64
dtype: object
A int32
B float64
dtype: object
Handling Different Array Shapes
2D Arrays (Most Common)
import numpy as np
import pandas as pd
# Standard 2D array
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
df = pd.DataFrame(matrix, columns=['Col1', 'Col2', 'Col3'])
print(df)
Output:
Col1 Col2 Col3
0 1 2 3
1 4 5 6
2 7 8 9
3D Arrays (Requires Reshaping)
import numpy as np
import pandas as pd
# 3D array: 2 layers, 3 rows, 4 columns
tensor = np.random.randint(0, 100, size=(2, 3, 4))
# Option 1: Flatten to 2D
flat = tensor.reshape(-1, tensor.shape[-1]) # (6, 4)
df = pd.DataFrame(flat, columns=['A', 'B', 'C', 'D'])
print(f"Shape: {df.shape}") # (6, 4)
# Option 2: Select one layer
df_layer0 = pd.DataFrame(tensor[0], columns=['A', 'B', 'C', 'D'])
print(df_layer0)
Output:
Shape: (6, 4)
A B C D
0 51 60 86 49
1 71 37 2 36
2 38 45 88 46
Practical Examples
Random Data for Testing
import numpy as np
import pandas as pd
np.random.seed(42)
# Generate random dataset
data = np.random.randn(100, 4)
df = pd.DataFrame(
data,
columns=['Feature1', 'Feature2', 'Feature3', 'Target']
)
print(df.head())
print(f"\nShape: {df.shape}")
print(f"\nStatistics:\n{df.describe()}")
Output:
Feature1 Feature2 Feature3 Target
0 0.496714 -0.138264 0.647689 1.523030
1 -0.234153 -0.234137 1.579213 0.767435
2 -0.469474 0.542560 -0.463418 -0.465730
3 0.241962 -1.913280 -1.724918 -0.562288
4 -1.012831 0.314247 -0.908024 -1.412304
Shape: (100, 4)
Statistics:
Feature1 Feature2 Feature3 Target
count 100.000000 100.000000 100.000000 100.000000
mean -0.009811 0.033746 0.022496 0.043764
std 0.868065 0.952234 1.044014 0.982240
min -2.025143 -1.959670 -3.241267 -1.987569
25% -0.716089 -0.564362 -0.616727 -0.727600
50% -0.000248 -0.024646 0.068665 0.075219
75% 0.528231 0.547116 0.701519 0.778891
max 2.314659 3.852731 2.189803 2.720169
Converting Model Predictions
import numpy as np
import pandas as pd
# Simulated ML predictions
sample_ids = np.arange(1, 6)
predictions = np.array([0.92, 0.15, 0.78, 0.34, 0.88])
actual = np.array([1, 0, 1, 0, 1])
results = pd.DataFrame({
'SampleID': sample_ids,
'Prediction': predictions,
'Actual': actual,
'Correct': (predictions > 0.5).astype(int) == actual
})
print(results)
Output:
SampleID Prediction Actual Correct
0 1 0.92 1 True
1 2 0.15 0 True
2 3 0.78 1 True
3 4 0.34 0 True
4 5 0.88 1 True
Time Series Data
import numpy as np
import pandas as pd
# Create time-indexed data
dates = pd.date_range('2024-01-01', periods=5, freq='D')
values = np.array([[100, 50], [105, 52], [103, 48], [110, 55], [108, 53]])
df = pd.DataFrame(
values,
columns=['Price', 'Volume'],
index=dates
)
print(df)
Output:
Price Volume
2024-01-01 100 50
2024-01-02 105 52
2024-01-03 103 48
2024-01-04 110 55
2024-01-05 108 53
Automatic Column Names
If you omit column names, Pandas assigns integer indices:
import numpy as np
import pandas as pd
arr = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(arr)
print(df)
# 0 1 2
# 0 1 2 3
# 1 4 5 6
# Add names later
df.columns = ['A', 'B', 'C']
print(df)
# A B C
# 0 1 2 3
# 1 4 5 6
Output:
0 1 2
0 1 2 3
1 4 5 6
A B C
0 1 2 3
1 4 5 6
Parameter Reference
| Parameter | Purpose | Example |
|---|---|---|
data | The NumPy array | np.array([[1,2],[3,4]]) |
columns | Column header names | ['X', 'Y'] |
index | Row labels | ['Row1', 'Row2'] |
dtype | Force data type for all columns | float, int32 |
copy | Copy data (default True) | False for memory efficiency |
Summary
Use pd.DataFrame(array, columns=[...]) to convert NumPy arrays into labeled, structured DataFrames. Add index for custom row labels and dtype to control data types. This conversion bridges numerical computation in NumPy with the data analysis capabilities of Pandas, enabling easier visualization, grouping, and export operations.