Python NumPy: How to Convert a DataFrame Column to a NumPy Array in Python
When working with data in Python, you'll frequently need to move data between pandas DataFrames and NumPy arrays. Converting a DataFrame column to a NumPy array is essential when you need to perform mathematical operations, pass data to machine learning models, or use libraries that expect array inputs.
In this guide, you'll learn multiple methods to convert a pandas DataFrame column to a NumPy array, understand the differences between them, and know when to use each approach.
Setting Up the Example DataFrame
Let's create a sample DataFrame to use throughout the examples:
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
print(df)
Output:
Name Age Score
0 Alice 25 88.5
1 Bob 30 92.3
2 Charlie 35 76.8
3 David 28 95.1
4 Eve 32 84.7
Using to_numpy() (Recommended)
The to_numpy() method is the recommended way to convert a DataFrame column (or an entire DataFrame) to a NumPy array. It's explicit, well-documented, and offers options for controlling the output.
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
# --- --- ---
age_array = df['Age'].to_numpy()
print(age_array)
print(f"Type: {type(age_array)}")
print(f"Dtype: {age_array.dtype}")
Output:
[25 30 35 28 32]
Type: <class 'numpy.ndarray'>
Dtype: int64
Specifying the Data Type
You can control the output array's data type with the dtype parameter:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
# --- --- ---
# Convert to float
age_float = df['Age'].to_numpy(dtype=float)
print(f"As float: {age_float}")
# Convert to float32 for memory efficiency
score_f32 = df['Score'].to_numpy(dtype=np.float32)
print(f"As float32: {score_f32}")
print(f"Dtype: {score_f32.dtype}")
Output:
As float: [25. 30. 35. 28. 32.]
As float32: [88.5 92.3 76.8 95.1 84.7]
Dtype: float32
Handling Missing Values
When your column contains NaN values, to_numpy() lets you specify a replacement value with na_value:
import numpy as np
import pandas as pd
df_with_nan = pd.DataFrame({
'Value': [10, np.nan, 30, np.nan, 50]
})
# Default behavior: keeps NaN
print(df_with_nan['Value'].to_numpy())
# Replace NaN with a specific value
print(df_with_nan['Value'].to_numpy(na_value=0))
Output:
[10. nan 30. nan 50.]
[10. 0. 30. 0. 50.]
to_numpy() is the preferred method as recommended by the pandas documentation. It provides the most control over the conversion process.
Using the .values Attribute
The .values attribute returns the underlying data of a Series as a NumPy array. It's concise and widely used in existing codebases:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
# --- --- ---
score_array = df['Score'].values
print(score_array)
print(f"Type: {type(score_array)}")
Output:
[88.5 92.3 76.8 95.1 84.7]
Type: <class 'numpy.ndarray'>
.values vs to_numpy()While .values works well in most cases, the pandas documentation recommends to_numpy() over .values because:
.valuesmay return a NumPy array or anExtensionArraydepending on the column type (e.g., for categorical or nullable integer types).to_numpy()always returns a NumPy array, making behavior more predictable.
import pandas as pd
# With categorical data, .values returns a Categorical, not an ndarray
cat_series = pd.Categorical(['a', 'b', 'c'])
s = pd.Series(cat_series)
print(f".values type: {type(s.values)}")
print(f"to_numpy() type: {type(s.to_numpy())}")
Output:
.values type: <class 'pandas.core.arrays.categorical.Categorical'>
to_numpy() type: <class 'numpy.ndarray'>
Using np.asarray()
NumPy's np.asarray() function converts any array-like input - including pandas Series - into a NumPy array:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
# --- --- ---
age_array = np.asarray(df['Age'])
print(age_array)
print(f"Type: {type(age_array)}")
Output:
[25 30 35 28 32]
Type: <class 'numpy.ndarray'>
You can also specify the data type:
age_float = np.asarray(df['Age'], dtype=np.float64)
print(f"As float64: {age_float}")
Output:
As float64: [25. 30. 35. 28. 32.]
np.asarray()np.asarray() is useful when writing functions that should accept both pandas Series and NumPy arrays as input. It converts Series to arrays and passes through existing arrays without copying:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
# --- --- ---
def process_data(data):
"""Works with both Series and arrays."""
arr = np.asarray(data)
return arr.mean()
# Works with a Series
print(process_data(df['Score']))
# Also works with a plain array
print(process_data(np.array([1, 2, 3])))
Output:
87.48
2.0
Using np.array()
np.array() is similar to np.asarray() but always creates a new copy of the data:
score_array = np.array(df['Score'])
print(score_array)
print(f"Type: {type(score_array)}")
Output:
[88.5 92.3 76.8 95.1 84.7]
Type: <class 'numpy.ndarray'>
Copy Behavior: np.array() vs np.asarray()
import numpy as np
import pandas as pd
original = np.array([1, 2, 3])
# np.asarray: no copy if already an ndarray
via_asarray = np.asarray(original)
via_asarray[0] = 99
print(f"Original after asarray modification: {original}") # Modified!
original = np.array([1, 2, 3])
# np.array: always copies
via_array = np.array(original)
via_array[0] = 99
print(f"Original after array modification: {original}") # Unchanged
Output:
Original after asarray modification: [99 2 3]
Original after array modification: [1 2 3]
Use np.array() when you need a guaranteed independent copy. Use np.asarray() when you want to avoid unnecessary copying for performance.
Converting Multiple Columns
To convert multiple columns at once, select them and use to_numpy() on the resulting DataFrame:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
# --- --- ---
# Convert multiple columns to a 2D NumPy array
numeric_array = df[['Age', 'Score']].to_numpy()
print(numeric_array)
print(f"Shape: {numeric_array.shape}")
Output:
[[25. 88.5]
[30. 92.3]
[35. 76.8]
[28. 95.1]
[32. 84.7]]
Shape: (5, 2)
Converting the Entire DataFrame
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 32],
'Score': [88.5, 92.3, 76.8, 95.1, 84.7]
})
# --- --- ---
# Convert all numeric columns
all_numeric = df.select_dtypes(include=[np.number]).to_numpy()
print(all_numeric)
print(f"Shape: {all_numeric.shape}")
Output:
[[25. 88.5]
[30. 92.3]
[35. 76.8]
[28. 95.1]
[32. 84.7]]
Shape: (5, 2)
Practical Example: Preparing Data for Machine Learning
A common real-world use case is preparing DataFrame data for scikit-learn or other ML libraries:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Feature_1': [1.2, 3.4, 5.6, 7.8, 9.0],
'Feature_2': [2.1, 4.3, 6.5, 8.7, 0.9],
'Label': [0, 1, 0, 1, 0]
})
# Features as 2D array
X = df[['Feature_1', 'Feature_2']].to_numpy()
# Labels as 1D array
y = df['Label'].to_numpy()
print(f"Features shape: {X.shape}, dtype: {X.dtype}")
print(f"Labels shape: {y.shape}, dtype: {y.dtype}")
print(f"\nFeatures:\n{X}")
print(f"\nLabels: {y}")
Output:
Features shape: (5, 2), dtype: float64
Labels shape: (5,), dtype: int64
Features:
[[1.2 2.1]
[3.4 4.3]
[5.6 6.5]
[7.8 8.7]
[9. 0.9]]
Labels: [0 1 0 1 0]
Quick Comparison of Methods
| Method | Returns ndarray Always | Creates Copy | Supports dtype | Supports na_value |
|---|---|---|---|---|
series.to_numpy() | ✅ | Depends | ✅ | ✅ |
series.values | ❌ (may return ExtensionArray) | No | ❌ | ❌ |
np.asarray(series) | ✅ | No (if possible) | ✅ | ❌ |
np.array(series) | ✅ | ✅ Always | ✅ | ❌ |
Conclusion
Converting a pandas DataFrame column to a NumPy array is straightforward with several methods available:
to_numpy()is the recommended approach - it always returns a NumPy array, supportsdtypeconversion andna_valuehandling, and is the most explicit..valuesis a quick shorthand but can return non-ndarray types for special column types.np.asarray()is ideal for writing flexible functions that accept both Series and arrays without unnecessary copying.np.array()guarantees an independent copy, useful when you need to modify the array without affecting the original DataFrame.
For most use cases, to_numpy() provides the best combination of reliability, flexibility, and clarity.