How to Convert a NumPy Array to a Pandas DataFrame in Python

Converting NumPy arrays to Pandas DataFrames adds labeled columns and indices, transforming raw numerical data into a structured format ideal for analysis, visualization, and export.

Basic Conversion

Pass the array directly to pd.DataFrame() with column names:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(arr, columns=['A', 'B'])
print(df)

Output:

Adding Custom Row Labels

Use the index parameter to assign meaningful row identifiers:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(
    arr,
    columns=['Sales', 'Profit'],
    index=['Q1', 'Q2', 'Q3']
)
print(df)

Output:

    Sales  Profit
Q1     10      20
Q2     30      40
Q3     50      60

Converting 1D Arrays

A 1D array creates a single-column DataFrame:

import numpy as np
import pandas as pd

arr_1d = np.array([100, 200, 300, 400])

# Creates a single column
df = pd.DataFrame(arr_1d, columns=['Value'])
print(df)

Output:

Multiple 1D Arrays as Columns

Combine several 1D arrays into separate columns:

import numpy as np
import pandas as pd

names = np.array(['Alice', 'Bob', 'Charlie'])
ages = np.array([25, 30, 35])
scores = np.array([85.5, 92.0, 78.5])

df = pd.DataFrame({
    'Name': names,
    'Age': ages,
    'Score': scores
})
print(df)

Output:

      Name  Age  Score
  Alice   25   85.5
    Bob   30   92.0
Charlie   35   78.5

Column Count Must Match

The number of column names must equal the number of array columns:

import numpy as np
import pandas as pd

arr = np.random.rand(5, 2)  # 5 rows, 2 columns

# ✅ Correct: 2 columns, 2 names
df = pd.DataFrame(arr, columns=['X', 'Y'])
print(df.shape)  # (5, 2)

# ❌ Error: 3 names for 2 columns
try:
    df = pd.DataFrame(arr, columns=['X', 'Y', 'Z'])
except ValueError as e:
    print(f"Error: {e}")
# Error: Shape of passed values is (5, 2), indices imply (5, 3)

warning

Always verify your array's shape with arr.shape before specifying column names to avoid dimension mismatch errors.

Specifying Data Types

Control column types with the dtype parameter:

import numpy as np
import pandas as pd

arr = np.array([[1, 2], [3, 4], [5, 6]])

# Force all columns to float
df_float = pd.DataFrame(arr, columns=['A', 'B'], dtype=float)
print(df_float.dtypes)
# A    float64
# B    float64

# Or specify types per column after creation
df = pd.DataFrame(arr, columns=['A', 'B'])
df = df.astype({'A': 'int32', 'B': 'float64'})
print(df.dtypes)
# A      int32
# B    float64

Output:

A    float64
B    float64
dtype: object
A      int32
B    float64
dtype: object

Handling Different Array Shapes

2D Arrays (Most Common)

import numpy as np
import pandas as pd

# Standard 2D array
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

df = pd.DataFrame(matrix, columns=['Col1', 'Col2', 'Col3'])
print(df)

Output:

   Col1  Col2  Col3
   1     2     3
   4     5     6
   7     8     9

3D Arrays (Requires Reshaping)

import numpy as np
import pandas as pd

# 3D array: 2 layers, 3 rows, 4 columns
tensor = np.random.randint(0, 100, size=(2, 3, 4))

# Option 1: Flatten to 2D
flat = tensor.reshape(-1, tensor.shape[-1])  # (6, 4)
df = pd.DataFrame(flat, columns=['A', 'B', 'C', 'D'])
print(f"Shape: {df.shape}")  # (6, 4)

# Option 2: Select one layer
df_layer0 = pd.DataFrame(tensor[0], columns=['A', 'B', 'C', 'D'])
print(df_layer0)

Output:

Shape: (6, 4)
    A   B   C   D
0  51  60  86  49
1  71  37   2  36
2  38  45  88  46

Practical Examples

Random Data for Testing

import numpy as np
import pandas as pd

np.random.seed(42)

# Generate random dataset
data = np.random.randn(100, 4)

df = pd.DataFrame(
    data,
    columns=['Feature1', 'Feature2', 'Feature3', 'Target']
)

print(df.head())
print(f"\nShape: {df.shape}")
print(f"\nStatistics:\n{df.describe()}")

Output:

   Feature1  Feature2  Feature3    Target
0  0.496714 -0.138264  0.647689  1.523030
1 -0.234153 -0.234137  1.579213  0.767435
2 -0.469474  0.542560 -0.463418 -0.465730
3  0.241962 -1.913280 -1.724918 -0.562288
4 -1.012831  0.314247 -0.908024 -1.412304

Shape: (100, 4)

Statistics:
         Feature1    Feature2    Feature3      Target
count  100.000000  100.000000  100.000000  100.000000
mean    -0.009811    0.033746    0.022496    0.043764
std      0.868065    0.952234    1.044014    0.982240
min     -2.025143   -1.959670   -3.241267   -1.987569
25%     -0.716089   -0.564362   -0.616727   -0.727600
50%     -0.000248   -0.024646    0.068665    0.075219
75%      0.528231    0.547116    0.701519    0.778891
max      2.314659    3.852731    2.189803    2.720169

Converting Model Predictions

import numpy as np
import pandas as pd

# Simulated ML predictions
sample_ids = np.arange(1, 6)
predictions = np.array([0.92, 0.15, 0.78, 0.34, 0.88])
actual = np.array([1, 0, 1, 0, 1])

results = pd.DataFrame({
    'SampleID': sample_ids,
    'Prediction': predictions,
    'Actual': actual,
    'Correct': (predictions > 0.5).astype(int) == actual
})

print(results)

Output:

   SampleID  Prediction  Actual  Correct
       1        0.92       1     True
       2        0.15       0     True
       3        0.78       1     True
       4        0.34       0     True
       5        0.88       1     True

Time Series Data

import numpy as np
import pandas as pd

# Create time-indexed data
dates = pd.date_range('2024-01-01', periods=5, freq='D')
values = np.array([[100, 50], [105, 52], [103, 48], [110, 55], [108, 53]])

df = pd.DataFrame(
    values,
    columns=['Price', 'Volume'],
    index=dates
)
print(df)

Output:

            Price  Volume
2024-01-01    100      50
2024-01-02    105      52
2024-01-03    103      48
2024-01-04    110      55
2024-01-05    108      53

Automatic Column Names

If you omit column names, Pandas assigns integer indices:

import numpy as np
import pandas as pd

arr = np.array([[1, 2, 3], [4, 5, 6]])

df = pd.DataFrame(arr)
print(df)
#    0  1  2
# 0  1  2  3
# 1  4  5  6

# Add names later
df.columns = ['A', 'B', 'C']
print(df)
#    A  B  C
# 0  1  2  3
# 1  4  5  6

Output:

Parameter Reference

Parameter	Purpose	Example
`data`	The NumPy array	`np.array([[1,2],[3,4]])`
`columns`	Column header names	`['X', 'Y']`
`index`	Row labels	`['Row1', 'Row2']`
`dtype`	Force data type for all columns	`float`, `int32`
`copy`	Copy data (default True)	`False` for memory efficiency

Summary

Use pd.DataFrame(array, columns=[...]) to convert NumPy arrays into labeled, structured DataFrames. Add index for custom row labels and dtype to control data types. This conversion bridges numerical computation in NumPy with the data analysis capabilities of Pandas, enabling easier visualization, grouping, and export operations.

Basic Conversion​

Adding Custom Row Labels​

Converting 1D Arrays​

Multiple 1D Arrays as Columns​

Column Count Must Match​

Specifying Data Types​

Handling Different Array Shapes​

2D Arrays (Most Common)​

3D Arrays (Requires Reshaping)​

Practical Examples​

Random Data for Testing​

Converting Model Predictions​

Time Series Data​

Automatic Column Names​

Parameter Reference​

Summary​

Table of Contents

Basic Conversion

Adding Custom Row Labels

Converting 1D Arrays

Multiple 1D Arrays as Columns

Column Count Must Match

Specifying Data Types

Handling Different Array Shapes

2D Arrays (Most Common)

3D Arrays (Requires Reshaping)

Practical Examples

Random Data for Testing

Converting Model Predictions

Time Series Data

Automatic Column Names

Parameter Reference

Summary