Skip to main content

How to Convert a NumPy Array to a Pandas DataFrame in Python

Converting NumPy arrays to Pandas DataFrames adds labeled columns and indices, transforming raw numerical data into a structured format ideal for analysis, visualization, and export.

Basic Conversion

Pass the array directly to pd.DataFrame() with column names:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(arr, columns=['A', 'B'])
print(df)

Output:

    A   B
0 10 20
1 30 40
2 50 60

Adding Custom Row Labels

Use the index parameter to assign meaningful row identifiers:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(
arr,
columns=['Sales', 'Profit'],
index=['Q1', 'Q2', 'Q3']
)
print(df)

Output:

    Sales  Profit
Q1 10 20
Q2 30 40
Q3 50 60

Converting 1D Arrays

A 1D array creates a single-column DataFrame:

import numpy as np
import pandas as pd

arr_1d = np.array([100, 200, 300, 400])

# Creates a single column
df = pd.DataFrame(arr_1d, columns=['Value'])
print(df)

Output:

   Value
0 100
1 200
2 300
3 400

Multiple 1D Arrays as Columns

Combine several 1D arrays into separate columns:

import numpy as np
import pandas as pd

names = np.array(['Alice', 'Bob', 'Charlie'])
ages = np.array([25, 30, 35])
scores = np.array([85.5, 92.0, 78.5])

df = pd.DataFrame({
'Name': names,
'Age': ages,
'Score': scores
})
print(df)

Output:

      Name  Age  Score
0 Alice 25 85.5
1 Bob 30 92.0
2 Charlie 35 78.5

Column Count Must Match

The number of column names must equal the number of array columns:

import numpy as np
import pandas as pd

arr = np.random.rand(5, 2) # 5 rows, 2 columns

# ✅ Correct: 2 columns, 2 names
df = pd.DataFrame(arr, columns=['X', 'Y'])
print(df.shape) # (5, 2)

# ❌ Error: 3 names for 2 columns
try:
df = pd.DataFrame(arr, columns=['X', 'Y', 'Z'])
except ValueError as e:
print(f"Error: {e}")
# Error: Shape of passed values is (5, 2), indices imply (5, 3)
warning

Always verify your array's shape with arr.shape before specifying column names to avoid dimension mismatch errors.

Specifying Data Types

Control column types with the dtype parameter:

import numpy as np
import pandas as pd

arr = np.array([[1, 2], [3, 4], [5, 6]])

# Force all columns to float
df_float = pd.DataFrame(arr, columns=['A', 'B'], dtype=float)
print(df_float.dtypes)
# A float64
# B float64

# Or specify types per column after creation
df = pd.DataFrame(arr, columns=['A', 'B'])
df = df.astype({'A': 'int32', 'B': 'float64'})
print(df.dtypes)
# A int32
# B float64

Output:

A    float64
B float64
dtype: object
A int32
B float64
dtype: object

Handling Different Array Shapes

2D Arrays (Most Common)

import numpy as np
import pandas as pd

# Standard 2D array
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])

df = pd.DataFrame(matrix, columns=['Col1', 'Col2', 'Col3'])
print(df)

Output:

   Col1  Col2  Col3
0 1 2 3
1 4 5 6
2 7 8 9

3D Arrays (Requires Reshaping)

import numpy as np
import pandas as pd

# 3D array: 2 layers, 3 rows, 4 columns
tensor = np.random.randint(0, 100, size=(2, 3, 4))

# Option 1: Flatten to 2D
flat = tensor.reshape(-1, tensor.shape[-1]) # (6, 4)
df = pd.DataFrame(flat, columns=['A', 'B', 'C', 'D'])
print(f"Shape: {df.shape}") # (6, 4)

# Option 2: Select one layer
df_layer0 = pd.DataFrame(tensor[0], columns=['A', 'B', 'C', 'D'])
print(df_layer0)

Output:

Shape: (6, 4)
A B C D
0 51 60 86 49
1 71 37 2 36
2 38 45 88 46

Practical Examples

Random Data for Testing

import numpy as np
import pandas as pd

np.random.seed(42)

# Generate random dataset
data = np.random.randn(100, 4)

df = pd.DataFrame(
data,
columns=['Feature1', 'Feature2', 'Feature3', 'Target']
)

print(df.head())
print(f"\nShape: {df.shape}")
print(f"\nStatistics:\n{df.describe()}")

Output:

   Feature1  Feature2  Feature3    Target
0 0.496714 -0.138264 0.647689 1.523030
1 -0.234153 -0.234137 1.579213 0.767435
2 -0.469474 0.542560 -0.463418 -0.465730
3 0.241962 -1.913280 -1.724918 -0.562288
4 -1.012831 0.314247 -0.908024 -1.412304

Shape: (100, 4)

Statistics:
Feature1 Feature2 Feature3 Target
count 100.000000 100.000000 100.000000 100.000000
mean -0.009811 0.033746 0.022496 0.043764
std 0.868065 0.952234 1.044014 0.982240
min -2.025143 -1.959670 -3.241267 -1.987569
25% -0.716089 -0.564362 -0.616727 -0.727600
50% -0.000248 -0.024646 0.068665 0.075219
75% 0.528231 0.547116 0.701519 0.778891
max 2.314659 3.852731 2.189803 2.720169

Converting Model Predictions

import numpy as np
import pandas as pd

# Simulated ML predictions
sample_ids = np.arange(1, 6)
predictions = np.array([0.92, 0.15, 0.78, 0.34, 0.88])
actual = np.array([1, 0, 1, 0, 1])

results = pd.DataFrame({
'SampleID': sample_ids,
'Prediction': predictions,
'Actual': actual,
'Correct': (predictions > 0.5).astype(int) == actual
})

print(results)

Output:

   SampleID  Prediction  Actual  Correct
0 1 0.92 1 True
1 2 0.15 0 True
2 3 0.78 1 True
3 4 0.34 0 True
4 5 0.88 1 True

Time Series Data

import numpy as np
import pandas as pd

# Create time-indexed data
dates = pd.date_range('2024-01-01', periods=5, freq='D')
values = np.array([[100, 50], [105, 52], [103, 48], [110, 55], [108, 53]])

df = pd.DataFrame(
values,
columns=['Price', 'Volume'],
index=dates
)
print(df)

Output:

            Price  Volume
2024-01-01 100 50
2024-01-02 105 52
2024-01-03 103 48
2024-01-04 110 55
2024-01-05 108 53

Automatic Column Names

If you omit column names, Pandas assigns integer indices:

import numpy as np
import pandas as pd

arr = np.array([[1, 2, 3], [4, 5, 6]])

df = pd.DataFrame(arr)
print(df)
# 0 1 2
# 0 1 2 3
# 1 4 5 6

# Add names later
df.columns = ['A', 'B', 'C']
print(df)
# A B C
# 0 1 2 3
# 1 4 5 6

Output:

   0  1  2
0 1 2 3
1 4 5 6
A B C
0 1 2 3
1 4 5 6

Parameter Reference

ParameterPurposeExample
dataThe NumPy arraynp.array([[1,2],[3,4]])
columnsColumn header names['X', 'Y']
indexRow labels['Row1', 'Row2']
dtypeForce data type for all columnsfloat, int32
copyCopy data (default True)False for memory efficiency

Summary

Use pd.DataFrame(array, columns=[...]) to convert NumPy arrays into labeled, structured DataFrames. Add index for custom row labels and dtype to control data types. This conversion bridges numerical computation in NumPy with the data analysis capabilities of Pandas, enabling easier visualization, grouping, and export operations.