How to Convert a NumPy Array to a Pandas DataFrame in Python

Converting NumPy arrays to Pandas DataFrames adds labeled columns and indices, transforming raw numerical data into a structured format ideal for analysis, visualization, and export. This conversion is one of the most common operations when bridging numerical computation in NumPy with the data analysis capabilities of Pandas.

In this guide, you will learn how to perform this conversion for different array shapes, add meaningful labels, handle data types, and apply the technique to practical scenarios.

Basic Conversion

Pass the array directly to pd.DataFrame() along with column names:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(arr, columns=['A', 'B'])
print(df)

Output:

Each column in the array maps to the corresponding name in the columns list, and Pandas automatically assigns integer indices starting from 0.

Adding Custom Row Labels

Use the index parameter to assign meaningful row identifiers instead of default integers:

import numpy as np
import pandas as pd

arr = np.array([[10, 20], [30, 40], [50, 60]])

df = pd.DataFrame(
    arr,
    columns=['Sales', 'Profit'],
    index=['Q1', 'Q2', 'Q3']
)
print(df)

Output:

    Sales  Profit
Q1     10      20
Q2     30      40
Q3     50      60

Converting 1D Arrays

A 1D array creates a single-column DataFrame. You must still pass the column name as a list:

import numpy as np
import pandas as pd

arr_1d = np.array([100, 200, 300, 400])

df = pd.DataFrame(arr_1d, columns=['Value'])
print(df)

Output:

Combining Multiple 1D Arrays as Columns

When you have several separate 1D arrays that represent different features, pass them as a dictionary to create a multi-column DataFrame:

import numpy as np
import pandas as pd

names = np.array(['Alice', 'Bob', 'Charlie'])
ages = np.array([25, 30, 35])
scores = np.array([85.5, 92.0, 78.5])

df = pd.DataFrame({
    'Name': names,
    'Age': ages,
    'Score': scores
})
print(df)

Output:

      Name  Age  Score
  Alice   25   85.5
    Bob   30   92.0
Charlie   35   78.5

This dictionary approach is especially useful when each array comes from a different source or computation.

Common Mistake: Column Count Mismatch

The number of column names must exactly match the number of columns in the array:

import numpy as np
import pandas as pd

arr = np.random.rand(5, 2)  # 5 rows, 2 columns

# Correct: 2 columns, 2 names
df = pd.DataFrame(arr, columns=['X', 'Y'])
print(f"Shape: {df.shape}")

# Wrong: 3 names for 2 columns
try:
    df = pd.DataFrame(arr, columns=['X', 'Y', 'Z'])
except ValueError as e:
    print(f"Error: {e}")

Output:

Shape: (5, 2)
Error: Shape of passed values is (5, 2), indices imply (5, 3)

warning

Always verify your array's shape with arr.shape before specifying column names. This is one of the most frequent errors when converting arrays to DataFrames, especially when the array is generated dynamically.

Specifying Data Types

Control column data types using the dtype parameter or the astype() method:

import numpy as np
import pandas as pd

arr = np.array([[1, 2], [3, 4], [5, 6]])

# Force all columns to float during creation
df_float = pd.DataFrame(arr, columns=['A', 'B'], dtype=float)
print(df_float.dtypes)
print()

# Or set types per column after creation
df = pd.DataFrame(arr, columns=['A', 'B'])
df = df.astype({'A': 'int32', 'B': 'float64'})
print(df.dtypes)

Output:

A    float64
B    float64
dtype: object

A      int32
B    float64
dtype: object

note

The dtype parameter applies the same type to all columns, while astype() gives you per-column control.

Handling Different Array Shapes

2D Arrays (Most Common)

import numpy as np
import pandas as pd

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

df = pd.DataFrame(matrix, columns=['Col1', 'Col2', 'Col3'])
print(df)

Output:

   Col1  Col2  Col3
   1     2     3
   4     5     6
   7     8     9

3D Arrays (Requires Reshaping)

DataFrames are inherently 2D, so 3D arrays must be reshaped before conversion:

import numpy as np
import pandas as pd

# 3D array: 2 layers, 3 rows, 4 columns
np.random.seed(42)
tensor = np.random.randint(0, 100, size=(2, 3, 4))

# Option 1: Flatten all layers into rows
flat = tensor.reshape(-1, tensor.shape[-1])  # (6, 4)
df = pd.DataFrame(flat, columns=['A', 'B', 'C', 'D'])
print(f"Flattened shape: {df.shape}")
print(df)
print()

# Option 2: Select a single layer
df_layer0 = pd.DataFrame(tensor[0], columns=['A', 'B', 'C', 'D'])
print(f"Single layer shape: {df_layer0.shape}")
print(df_layer0)

Output:

Flattened shape: (6, 4)
    A   B   C   D
0  51  92  14  71
1  60  20  82  86
2  74  74  87  99
3  23   2  21  52
4   1  87  29  37
5   1  63  59  20

Single layer shape: (3, 4)
    A   B   C   D
0  51  92  14  71
1  60  20  82  86
2  74  74  87  99

Automatic Column Names

If you omit the columns parameter, Pandas assigns integer column indices. You can rename them later:

import numpy as np
import pandas as pd

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Default integer column names
df = pd.DataFrame(arr)
print(df)
print()

# Rename columns after creation
df.columns = ['A', 'B', 'C']
print(df)

Output:

Practical Examples

Creating Test Data with Random Arrays

import numpy as np
import pandas as pd

np.random.seed(42)

data = np.random.randn(100, 4)

df = pd.DataFrame(
    data,
    columns=['Feature1', 'Feature2', 'Feature3', 'Target']
)

print(df.head())
print(f"\nShape: {df.shape}")
print(f"\nSummary Statistics:\n{df.describe().round(2)}")

Output:

   Feature1  Feature2  Feature3    Target
0  0.496714 -0.138264  0.647689  1.523030
1 -0.234153 -0.234137  1.579213  0.767435
2 -0.469474  0.542560 -0.463418 -0.465730
3  0.241962 -1.913280 -1.724918 -0.562288
4 -1.012831  0.314247 -0.908024 -1.412304

Shape: (100, 4)

Summary Statistics:
       Feature1  Feature2  Feature3  Target
count    100.00    100.00    100.00  100.00
mean      -0.01      0.03      0.02    0.04
std        0.87      0.95      1.04    0.98
min       -2.03     -1.96     -3.24   -1.99
25%       -0.72     -0.56     -0.62   -0.73
50%       -0.00     -0.02      0.07    0.08
75%        0.53      0.55      0.70    0.78
max        2.31      3.85      2.19    2.72

Converting Machine Learning Predictions

import numpy as np
import pandas as pd

sample_ids = np.arange(1, 6)
predictions = np.array([0.92, 0.15, 0.78, 0.34, 0.88])
actual = np.array([1, 0, 1, 0, 1])

results = pd.DataFrame({
    'SampleID': sample_ids,
    'Prediction': predictions,
    'Actual': actual,
    'Correct': (predictions > 0.5).astype(int) == actual
})

print(results)
print(f"\nAccuracy: {results['Correct'].mean():.0%}")

Output:

   SampleID  Prediction  Actual  Correct
       1        0.92       1     True
       2        0.15       0     True
       3        0.78       1     True
       4        0.34       0     True
       5        0.88       1     True

Accuracy: 100%

Time Series Data with Date Index

import numpy as np
import pandas as pd

dates = pd.date_range('2024-01-01', periods=5, freq='D')
values = np.array([[100, 50], [105, 52], [103, 48], [110, 55], [108, 53]])

df = pd.DataFrame(
    values,
    columns=['Price', 'Volume'],
    index=dates
)
print(df)

Output:

            Price  Volume
2024-01-01    100      50
2024-01-02    105      52
2024-01-03    103      48
2024-01-04    110      55
2024-01-05    108      53

Using a DatetimeIndex enables Pandas time series features like resampling, rolling windows, and date-based slicing.

Parameter Reference

Parameter	Purpose	Example
`data`	The NumPy array to convert	`np.array([[1, 2], [3, 4]])`
`columns`	Column header names	`['X', 'Y']`
`index`	Row labels	`['Row1', 'Row2']`
`dtype`	Force a single data type for all columns	`float`, `'int32'`
`copy`	Whether to copy the data (default `True`)	`False` for memory efficiency

About the copy Parameter

Setting copy=False can save memory by allowing the DataFrame to share the underlying data with the original NumPy array. However, modifying one will affect the other. Use copy=True (the default) when you need independent data, and copy=False only when you understand the implications and want to avoid duplicating large arrays.

Conclusion

Converting NumPy arrays to Pandas DataFrames is straightforward with pd.DataFrame(array, columns=[...]). Add the index parameter for custom row labels and dtype to control data types. For 1D arrays, pass them as a single-column DataFrame or combine multiple arrays using a dictionary. For 3D arrays, reshape to 2D first since DataFrames are inherently two-dimensional structures.

This conversion bridges the gap between NumPy's efficient numerical computation and Pandas' rich data analysis capabilities, enabling labeled access, grouping, filtering, visualization, and export operations on your array data.

Basic Conversion​

Adding Custom Row Labels​

Converting 1D Arrays​

Combining Multiple 1D Arrays as Columns​

Common Mistake: Column Count Mismatch​

Specifying Data Types​

Handling Different Array Shapes​

2D Arrays (Most Common)​

3D Arrays (Requires Reshaping)​

Automatic Column Names​

Practical Examples​

Creating Test Data with Random Arrays​

Converting Machine Learning Predictions​

Time Series Data with Date Index​

Parameter Reference​

Conclusion​

Table of Contents

Basic Conversion

Adding Custom Row Labels

Converting 1D Arrays

Combining Multiple 1D Arrays as Columns

Common Mistake: Column Count Mismatch

Specifying Data Types

Handling Different Array Shapes

2D Arrays (Most Common)

3D Arrays (Requires Reshaping)

Automatic Column Names

Practical Examples

Creating Test Data with Random Arrays

Converting Machine Learning Predictions

Time Series Data with Date Index

Parameter Reference

Conclusion