Skip to main content

How to Convert a NumPy Array to a Pandas DataFrame in Python

Converting NumPy arrays to Pandas DataFrames adds labeled columns and indices, transforming raw numerical data into a structured format ideal for analysis, visualization, and export. This conversion is one of the most common operations when bridging numerical computation in NumPy with the data analysis capabilities of Pandas.

In this guide, you will learn how to perform this conversion for different array shapes, add meaningful labels, handle data types, and apply the technique to practical scenarios.

Basic Conversion

Pass the array directly to pd.DataFrame() along with column names:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(arr, columns=['A', 'B'])
print(df)

Output:

    A   B
0 10 20
1 30 40
2 50 60

Each column in the array maps to the corresponding name in the columns list, and Pandas automatically assigns integer indices starting from 0.

Adding Custom Row Labels

Use the index parameter to assign meaningful row identifiers instead of default integers:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(
arr,
columns=['Sales', 'Profit'],
index=['Q1', 'Q2', 'Q3']
)
print(df)

Output:

    Sales  Profit
Q1 10 20
Q2 30 40
Q3 50 60

Converting 1D Arrays

A 1D array creates a single-column DataFrame. You must still pass the column name as a list:

import numpy as np
import pandas as pd

arr_1d = np.array([100, 200, 300, 400])

df = pd.DataFrame(arr_1d, columns=['Value'])
print(df)

Output:

   Value
0 100
1 200
2 300
3 400

Combining Multiple 1D Arrays as Columns

When you have several separate 1D arrays that represent different features, pass them as a dictionary to create a multi-column DataFrame:

import numpy as np
import pandas as pd

names = np.array(['Alice', 'Bob', 'Charlie'])
ages = np.array([25, 30, 35])
scores = np.array([85.5, 92.0, 78.5])

df = pd.DataFrame({
'Name': names,
'Age': ages,
'Score': scores
})
print(df)

Output:

      Name  Age  Score
0 Alice 25 85.5
1 Bob 30 92.0
2 Charlie 35 78.5

This dictionary approach is especially useful when each array comes from a different source or computation.

Common Mistake: Column Count Mismatch

The number of column names must exactly match the number of columns in the array:

import numpy as np
import pandas as pd

arr = np.random.rand(5, 2) # 5 rows, 2 columns

# Correct: 2 columns, 2 names
df = pd.DataFrame(arr, columns=['X', 'Y'])
print(f"Shape: {df.shape}")

# Wrong: 3 names for 2 columns
try:
df = pd.DataFrame(arr, columns=['X', 'Y', 'Z'])
except ValueError as e:
print(f"Error: {e}")

Output:

Shape: (5, 2)
Error: Shape of passed values is (5, 2), indices imply (5, 3)
warning

Always verify your array's shape with arr.shape before specifying column names. This is one of the most frequent errors when converting arrays to DataFrames, especially when the array is generated dynamically.

Specifying Data Types

Control column data types using the dtype parameter or the astype() method:

import numpy as np
import pandas as pd

arr = np.array([[1, 2], [3, 4], [5, 6]])

# Force all columns to float during creation
df_float = pd.DataFrame(arr, columns=['A', 'B'], dtype=float)
print(df_float.dtypes)
print()

# Or set types per column after creation
df = pd.DataFrame(arr, columns=['A', 'B'])
df = df.astype({'A': 'int32', 'B': 'float64'})
print(df.dtypes)

Output:

A    float64
B float64
dtype: object

A int32
B float64
dtype: object
note

The dtype parameter applies the same type to all columns, while astype() gives you per-column control.

Handling Different Array Shapes

2D Arrays (Most Common)

import numpy as np
import pandas as pd

matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])

df = pd.DataFrame(matrix, columns=['Col1', 'Col2', 'Col3'])
print(df)

Output:

   Col1  Col2  Col3
0 1 2 3
1 4 5 6
2 7 8 9

3D Arrays (Requires Reshaping)

DataFrames are inherently 2D, so 3D arrays must be reshaped before conversion:

import numpy as np
import pandas as pd

# 3D array: 2 layers, 3 rows, 4 columns
np.random.seed(42)
tensor = np.random.randint(0, 100, size=(2, 3, 4))

# Option 1: Flatten all layers into rows
flat = tensor.reshape(-1, tensor.shape[-1]) # (6, 4)
df = pd.DataFrame(flat, columns=['A', 'B', 'C', 'D'])
print(f"Flattened shape: {df.shape}")
print(df)
print()

# Option 2: Select a single layer
df_layer0 = pd.DataFrame(tensor[0], columns=['A', 'B', 'C', 'D'])
print(f"Single layer shape: {df_layer0.shape}")
print(df_layer0)

Output:

Flattened shape: (6, 4)
A B C D
0 51 92 14 71
1 60 20 82 86
2 74 74 87 99
3 23 2 21 52
4 1 87 29 37
5 1 63 59 20

Single layer shape: (3, 4)
A B C D
0 51 92 14 71
1 60 20 82 86
2 74 74 87 99

Automatic Column Names

If you omit the columns parameter, Pandas assigns integer column indices. You can rename them later:

import numpy as np
import pandas as pd

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Default integer column names
df = pd.DataFrame(arr)
print(df)
print()

# Rename columns after creation
df.columns = ['A', 'B', 'C']
print(df)

Output:

   0  1  2
0 1 2 3
1 4 5 6

A B C
0 1 2 3
1 4 5 6

Practical Examples

Creating Test Data with Random Arrays

import numpy as np
import pandas as pd

np.random.seed(42)

data = np.random.randn(100, 4)

df = pd.DataFrame(
data,
columns=['Feature1', 'Feature2', 'Feature3', 'Target']
)

print(df.head())
print(f"\nShape: {df.shape}")
print(f"\nSummary Statistics:\n{df.describe().round(2)}")

Output:

   Feature1  Feature2  Feature3    Target
0 0.496714 -0.138264 0.647689 1.523030
1 -0.234153 -0.234137 1.579213 0.767435
2 -0.469474 0.542560 -0.463418 -0.465730
3 0.241962 -1.913280 -1.724918 -0.562288
4 -1.012831 0.314247 -0.908024 -1.412304

Shape: (100, 4)

Summary Statistics:
Feature1 Feature2 Feature3 Target
count 100.00 100.00 100.00 100.00
mean -0.01 0.03 0.02 0.04
std 0.87 0.95 1.04 0.98
min -2.03 -1.96 -3.24 -1.99
25% -0.72 -0.56 -0.62 -0.73
50% -0.00 -0.02 0.07 0.08
75% 0.53 0.55 0.70 0.78
max 2.31 3.85 2.19 2.72

Converting Machine Learning Predictions

import numpy as np
import pandas as pd

sample_ids = np.arange(1, 6)
predictions = np.array([0.92, 0.15, 0.78, 0.34, 0.88])
actual = np.array([1, 0, 1, 0, 1])

results = pd.DataFrame({
'SampleID': sample_ids,
'Prediction': predictions,
'Actual': actual,
'Correct': (predictions > 0.5).astype(int) == actual
})

print(results)
print(f"\nAccuracy: {results['Correct'].mean():.0%}")

Output:

   SampleID  Prediction  Actual  Correct
0 1 0.92 1 True
1 2 0.15 0 True
2 3 0.78 1 True
3 4 0.34 0 True
4 5 0.88 1 True

Accuracy: 100%

Time Series Data with Date Index

import numpy as np
import pandas as pd

dates = pd.date_range('2024-01-01', periods=5, freq='D')
values = np.array([[100, 50], [105, 52], [103, 48], [110, 55], [108, 53]])

df = pd.DataFrame(
values,
columns=['Price', 'Volume'],
index=dates
)
print(df)

Output:

            Price  Volume
2024-01-01 100 50
2024-01-02 105 52
2024-01-03 103 48
2024-01-04 110 55
2024-01-05 108 53

Using a DatetimeIndex enables Pandas time series features like resampling, rolling windows, and date-based slicing.

Parameter Reference

ParameterPurposeExample
dataThe NumPy array to convertnp.array([[1, 2], [3, 4]])
columnsColumn header names['X', 'Y']
indexRow labels['Row1', 'Row2']
dtypeForce a single data type for all columnsfloat, 'int32'
copyWhether to copy the data (default True)False for memory efficiency
About the copy Parameter

Setting copy=False can save memory by allowing the DataFrame to share the underlying data with the original NumPy array. However, modifying one will affect the other. Use copy=True (the default) when you need independent data, and copy=False only when you understand the implications and want to avoid duplicating large arrays.

Conclusion

Converting NumPy arrays to Pandas DataFrames is straightforward with pd.DataFrame(array, columns=[...]). Add the index parameter for custom row labels and dtype to control data types. For 1D arrays, pass them as a single-column DataFrame or combine multiple arrays using a dictionary. For 3D arrays, reshape to 2D first since DataFrames are inherently two-dimensional structures.

This conversion bridges the gap between NumPy's efficient numerical computation and Pandas' rich data analysis capabilities, enabling labeled access, grouping, filtering, visualization, and export operations on your array data.