Skip to main content

Python NumPy: How to Read CSV Data into a Record Array

A record array (or recarray) in NumPy is a specialized array that stores structured, tabular data with named columns and mixed data types. Unlike regular NumPy arrays that hold a single data type, record arrays let you store integers, floats, and strings in different columns - similar to a spreadsheet or database table - while accessing fields conveniently as attributes.

In this guide, you'll learn three methods to read CSV data into a NumPy record array, understand the differences between each approach, and know when to use which one.

What Is a Record Array?

A record array extends NumPy's structured arrays by providing attribute-style access to named fields:

import numpy as np

# Create a simple record array
data = np.rec.array(
[(1, 'Alice', 50000.0), (2, 'Bob', 60000.0)],
dtype=[('ID', 'i4'), ('Name', 'U10'), ('Salary', 'f8')]
)

# Access fields as attributes
print(data.Name) # ['Alice' 'Bob']
print(data.Salary) # [50000. 60000.]
print(data[0]) # (1, 'Alice', 50000.0)

Key features:

  • Named fields: Access columns by name (e.g., data.Name) instead of index.
  • Mixed data types: Different columns can hold integers, floats, strings, etc.
  • NumPy performance: Operates at NumPy speed, faster than Python lists of dictionaries.

Sample CSV File

All examples below use this CSV file (employees.csv):

employees.csv
ID,Name,Salary
1,Alice,50000
2,Bob,60000
3,Charlie,55000

You can create it programmatically:

csv_content = """ID,Name,Salary
1,Alice,50000
2,Bob,60000
3,Charlie,55000
"""

with open('employees.csv', 'w') as f:
f.write(csv_content)

Method 1: Using numpy.genfromtxt()

The numpy.genfromtxt() function is the most versatile option for reading CSV data into a structured array. It automatically infers data types and handles missing values:

import numpy as np

data = np.genfromtxt(
'employees.csv',
delimiter=',',
dtype=None,
names=True,
encoding=None
)

print("Type:", type(data))
print("Data:")
print(data)
print("\nField names:", data.dtype.names)
print("Names:", data['Name'])
print("Salaries:", data['Salary'])

Output:

Type: <class 'numpy.ndarray'>
Data:
[(1, 'Alice', 50000) (2, 'Bob', 60000) (3, 'Charlie', 55000)]

Field names: ('ID', 'Name', 'Salary')
Names: ['Alice' 'Bob' 'Charlie']
Salaries: [50000 60000 55000]

Parameter explanation:

ParameterValuePurpose
delimiter','Specifies the column separator
dtypeNoneLets NumPy automatically infer data types
namesTrueUses the first row as column names
encodingNoneUses the system default encoding
tip

genfromtxt() is the best choice when you need fine-grained control over parsing, such as handling missing values, skipping rows, or specifying custom data types. Use the filling_values parameter to define default values for missing data.

Method 2: Using numpy.recfromcsv() (ONLY in NumPy <v2.0)

The numpy.recfromcsv() function is a convenience wrapper specifically designed for reading CSV files into record arrays. It sets sensible defaults automatically:

import numpy as np

data = np.recfromcsv('employees.csv', encoding=None)

print("Type:", type(data))
print("Data:")
print(data)
print("\nAccess by attribute:")
print("Names:", data.name)
print("Salaries:", data.salary)

Output:

Type: <class 'numpy.recarray'>
Data:
[(1, 'Alice', 50000) (2, 'Bob', 60000) (3, 'Charlie', 55000)]

Access by attribute:
Names: ['Alice' 'Bob' 'Charlie']
Salaries: [50000 60000 55000]
caution

recfromcsv() converts column names to lowercase automatically. If your CSV has Name and Salary as headers, the record array fields will be name and salary. This can cause confusion if you expect case-sensitive field names.

# ❌ This raises an AttributeError
try:
print(data.Name)
except AttributeError as e:
print(f"Error: {e}")

# ✅ Use lowercase
print(data.name)

Notice that recfromcsv() returns a numpy.recarray directly (not just a structured ndarray), so you can access fields as attributes without any additional conversion.

Method 3: Using Pandas and Converting to a Record Array

If you're already using Pandas in your project, you can read the CSV into a DataFrame first and then convert it to a NumPy record array using to_records():

import pandas as pd

df = pd.read_csv('employees.csv')
print("DataFrame:")
print(df)

# Convert to NumPy record array
data = df.to_records(index=False)
print("\nRecord array:")
print(data)
print("\nNames:", data.Name)
print("Salaries:", data.Salary)

Output:

DataFrame:
ID Name Salary
0 1 Alice 50000
1 2 Bob 60000
2 3 Charlie 55000

Record array:
[(1, 'Alice', 50000) (2, 'Bob', 60000) (3, 'Charlie', 55000)]

Names: ['Alice' 'Bob' 'Charlie']
Salaries: [50000 60000 55000]
Why index=False?

Setting index=False in to_records() excludes the DataFrame's row index from the record array. Without it, an extra index field is added:

import pandas as pd

df = pd.read_csv('employees.csv')
print("DataFrame:")
print(df)

# With index (default)
data_with_index = df.to_records()
print(data_with_index.dtype.names) # ('index', 'ID', 'Name', 'Salary')

# Without index
data_no_index = df.to_records(index=False)
print(data_no_index.dtype.names) # ('ID', 'Name', 'Salary')

Output:

DataFrame:
ID Name Salary
0 1 Alice 50000
1 2 Bob 60000
2 3 Charlie 55000
('index', 'ID', 'Name', 'Salary')
('ID', 'Name', 'Salary')

This approach is especially useful when you want to preprocess data with Pandas (filtering, cleaning, merging) before converting to a NumPy record array for numerical computation.

Comparison of Methods

Featuregenfromtxt()recfromcsv()Pandas + to_records()
Returns typeStructured ndarrayrecarrayrecarray
Attribute accessVia field names only (data['Name'])✅ (data.name)✅ (data.Name)
Auto-detects types✅ (with dtype=None)
Preserves column case❌ (lowercased)
Handles missing values✅ (filling_values)✅ (more options)
Requires Pandas
Data preprocessingLimitedLimited✅ (full Pandas power)
Best forFine-grained controlQuick CSV to recarrayPreprocessing + conversion

Converting a Structured Array to a Record Array

If you use genfromtxt() and want attribute-style access, convert the result to a record array with view():

import numpy as np

# genfromtxt returns a structured ndarray
structured = np.genfromtxt(
'employees.csv', delimiter=',', dtype=None, names=True, encoding=None
)

# Convert to record array for attribute access
record = structured.view(np.recarray)

print("Access as attribute:", record.Name)
print("Access as field: ", record['Name'])

Output:

Access as attribute: ['Alice' 'Bob' 'Charlie']
Access as field: ['Alice' 'Bob' 'Charlie']

Common Mistake: Encoding Issues

When reading CSV files, you may encounter UnicodeDecodeError if the file uses a non-UTF-8 encoding:

import numpy as np

# ❌ May fail with encoding errors
try:
data = np.genfromtxt('data_latin1.csv', delimiter=',', dtype=None, names=True)
except UnicodeDecodeError as e:
print(f"Error: {e}")

Fix: Specify the correct encoding:

# ✅ Specify encoding explicitly
data = np.genfromtxt(
'data_latin1.csv', delimiter=',', dtype=None, names=True, encoding='latin-1'
)

With Pandas, use pd.read_csv('file.csv', encoding='latin-1').

Summary

To read CSV data into a NumPy record array:

  • Use numpy.genfromtxt() for the most control over parsing - specify delimiters, handle missing values, and choose data types. Convert to a recarray with .view(np.recarray) if you need attribute access.
  • Use numpy.recfromcsv() for the quickest, simplest approach - it returns a recarray directly with sensible defaults. Be aware that it lowercases column names.
  • Use Pandas with to_records() when you need to preprocess data (filter, clean, merge) before converting - it offers the most flexibility but requires an additional dependency.

All three methods produce functionally equivalent record arrays. Choose based on whether you need preprocessing capabilities, attribute-style access, or minimal dependencies.