Skip to main content

Python NumPy: How to Convert a DataFrame Column to a NumPy Array in Python

When working with data in Python, you'll frequently need to move data between pandas DataFrames and NumPy arrays. Converting a DataFrame column to a NumPy array is essential when you need to perform mathematical operations, pass data to machine learning models, or use libraries that expect array inputs.

In this guide, you'll learn multiple methods to convert a pandas DataFrame column to a NumPy array, understand the differences between them, and know when to use each approach.

Setting Up the Example DataFrame

Let's create a sample DataFrame to use throughout the examples:

import pandas as pd
import numpy as np

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

print(df)

Output:

      Name  Age  Score
0 Alice 25 88.5
1 Bob 30 92.3
2 Charlie 35 76.8
3 David 28 95.1
4 Eve 32 84.7

The to_numpy() method is the recommended way to convert a DataFrame column (or an entire DataFrame) to a NumPy array. It's explicit, well-documented, and offers options for controlling the output.

import numpy as np
import pandas as pd

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

# --- --- ---

age_array = df['Age'].to_numpy()

print(age_array)
print(f"Type: {type(age_array)}")
print(f"Dtype: {age_array.dtype}")

Output:

[25 30 35 28 32]
Type: <class 'numpy.ndarray'>
Dtype: int64

Specifying the Data Type

You can control the output array's data type with the dtype parameter:

import numpy as np
import pandas as pd

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

# --- --- ---

# Convert to float
age_float = df['Age'].to_numpy(dtype=float)
print(f"As float: {age_float}")

# Convert to float32 for memory efficiency
score_f32 = df['Score'].to_numpy(dtype=np.float32)
print(f"As float32: {score_f32}")
print(f"Dtype: {score_f32.dtype}")

Output:

As float: [25. 30. 35. 28. 32.]
As float32: [88.5 92.3 76.8 95.1 84.7]
Dtype: float32

Handling Missing Values

When your column contains NaN values, to_numpy() lets you specify a replacement value with na_value:

import numpy as np
import pandas as pd

df_with_nan = pd.DataFrame({
'Value': [10, np.nan, 30, np.nan, 50]
})

# Default behavior: keeps NaN
print(df_with_nan['Value'].to_numpy())

# Replace NaN with a specific value
print(df_with_nan['Value'].to_numpy(na_value=0))

Output:

[10. nan 30. nan 50.]
[10. 0. 30. 0. 50.]
tip

to_numpy() is the preferred method as recommended by the pandas documentation. It provides the most control over the conversion process.

Using the .values Attribute

The .values attribute returns the underlying data of a Series as a NumPy array. It's concise and widely used in existing codebases:

import numpy as np
import pandas as pd

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

# --- --- ---

score_array = df['Score'].values

print(score_array)
print(f"Type: {type(score_array)}")

Output:

[88.5 92.3 76.8 95.1 84.7]
Type: <class 'numpy.ndarray'>
.values vs to_numpy()

While .values works well in most cases, the pandas documentation recommends to_numpy() over .values because:

  • .values may return a NumPy array or an ExtensionArray depending on the column type (e.g., for categorical or nullable integer types).
  • to_numpy() always returns a NumPy array, making behavior more predictable.
import pandas as pd

# With categorical data, .values returns a Categorical, not an ndarray
cat_series = pd.Categorical(['a', 'b', 'c'])
s = pd.Series(cat_series)

print(f".values type: {type(s.values)}")
print(f"to_numpy() type: {type(s.to_numpy())}")

Output:

.values type:    <class 'pandas.core.arrays.categorical.Categorical'>
to_numpy() type: <class 'numpy.ndarray'>

Using np.asarray()

NumPy's np.asarray() function converts any array-like input - including pandas Series - into a NumPy array:

import numpy as np
import pandas as pd

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

# --- --- ---

age_array = np.asarray(df['Age'])

print(age_array)
print(f"Type: {type(age_array)}")

Output:

[25 30 35 28 32]
Type: <class 'numpy.ndarray'>

You can also specify the data type:

age_float = np.asarray(df['Age'], dtype=np.float64)
print(f"As float64: {age_float}")

Output:

As float64: [25. 30. 35. 28. 32.]
When to Use np.asarray()

np.asarray() is useful when writing functions that should accept both pandas Series and NumPy arrays as input. It converts Series to arrays and passes through existing arrays without copying:

import numpy as np
import pandas as pd

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

# --- --- ---

def process_data(data):
"""Works with both Series and arrays."""
arr = np.asarray(data)
return arr.mean()

# Works with a Series
print(process_data(df['Score']))

# Also works with a plain array
print(process_data(np.array([1, 2, 3])))

Output:

87.48
2.0

Using np.array()

np.array() is similar to np.asarray() but always creates a new copy of the data:

score_array = np.array(df['Score'])

print(score_array)
print(f"Type: {type(score_array)}")

Output:

[88.5 92.3 76.8 95.1 84.7]
Type: <class 'numpy.ndarray'>

Copy Behavior: np.array() vs np.asarray()

import numpy as np
import pandas as pd

original = np.array([1, 2, 3])

# np.asarray: no copy if already an ndarray
via_asarray = np.asarray(original)
via_asarray[0] = 99
print(f"Original after asarray modification: {original}") # Modified!

original = np.array([1, 2, 3])

# np.array: always copies
via_array = np.array(original)
via_array[0] = 99
print(f"Original after array modification: {original}") # Unchanged

Output:

Original after asarray modification: [99  2  3]
Original after array modification: [1 2 3]

Use np.array() when you need a guaranteed independent copy. Use np.asarray() when you want to avoid unnecessary copying for performance.

Converting Multiple Columns

To convert multiple columns at once, select them and use to_numpy() on the resulting DataFrame:

import numpy as np
import pandas as pd

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

# --- --- ---

# Convert multiple columns to a 2D NumPy array
numeric_array = df[['Age', 'Score']].to_numpy()

print(numeric_array)
print(f"Shape: {numeric_array.shape}")

Output:

[[25.  88.5]
[30. 92.3]
[35. 76.8]
[28. 95.1]
[32. 84.7]]
Shape: (5, 2)

Converting the Entire DataFrame

import numpy as np
import pandas as pd

np.random.seed(42)

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})

# --- --- ---

# Convert all numeric columns
all_numeric = df.select_dtypes(include=[np.number]).to_numpy()

print(all_numeric)
print(f"Shape: {all_numeric.shape}")

Output:

[[25.  88.5]
[30. 92.3]
[35. 76.8]
[28. 95.1]
[32. 84.7]]
Shape: (5, 2)

Practical Example: Preparing Data for Machine Learning

A common real-world use case is preparing DataFrame data for scikit-learn or other ML libraries:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Feature_1': [1.2, 3.4, 5.6, 7.8, 9.0],
'Feature_2': [2.1, 4.3, 6.5, 8.7, 0.9],
'Label': [0, 1, 0, 1, 0]
})

# Features as 2D array
X = df[['Feature_1', 'Feature_2']].to_numpy()

# Labels as 1D array
y = df['Label'].to_numpy()

print(f"Features shape: {X.shape}, dtype: {X.dtype}")
print(f"Labels shape: {y.shape}, dtype: {y.dtype}")
print(f"\nFeatures:\n{X}")
print(f"\nLabels: {y}")

Output:

Features shape: (5, 2), dtype: float64
Labels shape: (5,), dtype: int64

Features:
[[1.2 2.1]
[3.4 4.3]
[5.6 6.5]
[7.8 8.7]
[9. 0.9]]

Labels: [0 1 0 1 0]

Quick Comparison of Methods

MethodReturns ndarray AlwaysCreates CopySupports dtypeSupports na_value
series.to_numpy()Depends
series.values❌ (may return ExtensionArray)No
np.asarray(series)No (if possible)
np.array(series)✅ Always

Conclusion

Converting a pandas DataFrame column to a NumPy array is straightforward with several methods available:

  • to_numpy() is the recommended approach - it always returns a NumPy array, supports dtype conversion and na_value handling, and is the most explicit.
  • .values is a quick shorthand but can return non-ndarray types for special column types.
  • np.asarray() is ideal for writing flexible functions that accept both Series and arrays without unnecessary copying.
  • np.array() guarantees an independent copy, useful when you need to modify the array without affecting the original DataFrame.

For most use cases, to_numpy() provides the best combination of reliability, flexibility, and clarity.