Python NumPy: How to Import Text Files Into NumPy Arrays in Python
Loading data from text files into NumPy arrays is one of the most common tasks in data analysis, scientific computing, and machine learning. Whether you're working with CSV files, tab-separated data, or custom-formatted text files, NumPy provides powerful built-in functions that can read and convert structured text data into arrays with just a single line of code.
The two primary functions for this task are numpy.loadtxt() and numpy.genfromtxt(). While loadtxt() is optimized for clean, well-formatted data, genfromtxt() is designed to handle messy, real-world datasets with missing values and mixed data types.
In this guide, you will learn how to use both functions with practical examples covering delimiters, skipping rows, selecting columns, handling comments, and dealing with missing data.
Using numpy.loadtxt() - Fast and Simple
numpy.loadtxt() is the go-to function for loading clean, consistently formatted numerical data from text files. It is fast, efficient, and works well with CSV, TSV, and space-separated files.
Syntax
numpy.loadtxt(
fname, # File name or path
delimiter=None, # Character separating values (default: whitespace)
dtype=float, # Data type of the resulting array
skiprows=0, # Number of initial rows to skip
usecols=None, # Specific columns to read
comments='#', # Character indicating comment lines
max_rows=None # Maximum number of rows to read
)
Loading a Simple Space-Separated Text File
Consider a file example.txt with space-separated integers:
1 2
3 4
5 6
7 8
9 10
import numpy as np
data = np.loadtxt("example.txt", dtype=int)
print("Loaded Data:")
print(data)
Output:
Loaded Data:
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]]
Since the file uses whitespace as the default separator, no delimiter parameter is needed. Setting dtype=int ensures all values are loaded as integers rather than the default floats.
Loading a CSV File Using a Delimiter
When values are separated by commas, specify delimiter=",". Consider a file data.csv:
1,2,3
4,5,6
7,8,9
import numpy as np
data = np.loadtxt("data.csv", delimiter=",")
print("CSV Data:")
print(data)
Output:
CSV Data:
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
The default dtype is float, which is why the output shows 1. instead of 1. If your data is strictly integers, pass dtype=int to avoid unnecessary floating-point conversion.
Skipping Comment Lines
Many data files include comment lines (metadata, descriptions, or headers) prefixed with a special character. The comments parameter tells loadtxt() to ignore these lines. Consider a file commented_data.csv:
import numpy as np
data = np.loadtxt("commented_data.csv", delimiter=",", comments="#")
print("Data without comments:")
print(data)
Output:
Data without comments:
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
By default, comments='#' is already set, so lines starting with # are automatically skipped even without explicitly specifying the parameter.
Skipping Initial Rows
Some files have metadata or headers in the first few rows that are not comment lines. Use skiprows to skip them. Consider a file with_metadata.csv:
Dataset: Temperature Readings
Date: 2025-01-15
Unit: Celsius
23.5,24.1,22.8
25.0,24.7,23.9
22.1,23.3,24.5
import numpy as np
data = np.loadtxt("with_metadata.csv", delimiter=",", skiprows=3)
print("Data after skipping metadata:")
print(data)
Output:
Data after skipping metadata:
[[23.5 24.1 22.8]
[25. 24.7 23.9]
[22.1 23.3 24.5]]
Selecting Specific Columns
When you only need certain columns from a file, use usecols to avoid loading unnecessary data. Consider a file people.txt:
ID Name Score
1 Ankit 85
2 Bunty 92
3 Tinku 78
4 Rina 95
5 Rajesh 88
To extract only the names (column index 1):
import numpy as np
names = np.loadtxt("people.txt", usecols=1, skiprows=1, dtype=str)
for name in names:
print(name)
Output:
Ankit
Bunty
Tinku
Rina
Rajesh
To extract multiple columns, pass a tuple:
# Load columns 0 and 2 (ID and Score)
data = np.loadtxt("people.txt", usecols=(0, 2), skiprows=1, dtype=int)
print(data)
Output:
[[ 1 85]
[ 2 92]
[ 3 78]
[ 4 95]
[ 5 88]]
loadtxt() fails on missing or inconsistent dataIf your file has missing values, blank entries, or inconsistent row lengths, loadtxt() will raise a ValueError. For such cases, use genfromtxt() instead.
# ❌ This will fail if the file has missing values
data = np.loadtxt("messy_data.csv", delimiter=",")
# ValueError: could not convert string '' to float64
Using numpy.genfromtxt() - Flexible and Robust
numpy.genfromtxt() is a more versatile alternative that can handle missing values, mixed data types, and irregular file formats. It is ideal for real-world datasets that are not perfectly clean.
Loading String Data
Consider a file words.txt with comma-separated text:
a,b,c,d
e,f,g,h
import numpy as np
data = np.genfromtxt("words.txt", dtype=str, delimiter=",", encoding="utf-8")
print(data)
Output:
[['a' 'b' 'c' 'd']
['e' 'f' 'g' 'h']]
Skipping Footer Rows
Unlike loadtxt(), genfromtxt() supports skipping lines at the end of a file using skip_footer. Consider a file with_footer.txt:
This is TutorialReference Website
How are You Ready?
tutorial reference . com
--- End of File ---
import numpy as np
data = np.genfromtxt("with_footer.txt", dtype=str, skip_footer=1)
print(data)
Output:
[['This' 'is' 'TutorialReference' 'Website']
['How' 'are' 'You' 'Ready?']
['tutorial' 'reference' '.' 'com']]
Handling Missing Values Automatically
This is where genfromtxt() truly shines. Consider a file incomplete.csv with a missing value:
1,2,3
4,,6
7,8,9
import numpy as np
data = np.genfromtxt("incomplete.csv", delimiter=",", dtype=float)
print(data)
Output:
[[ 1. 2. 3.]
[ 4. nan 6.]
[ 7. 8. 9.]]
The missing value is automatically replaced with nan (Not a Number), allowing the rest of the data to load without errors.
Replacing Missing Values with a Custom Default
Instead of nan, you can fill missing entries with a specific value using the filling_values parameter:
import numpy as np
data = np.genfromtxt(
"incomplete.csv",
delimiter=",",
dtype=float,
filling_values=0 # Replace missing values with 0
)
print(data)
Output:
[[1. 2. 3.]
[4. 0. 6.]
[7. 8. 9.]]
This is especially useful when missing data could cause errors in downstream calculations or machine learning models.
loadtxt() vs. genfromtxt() - When to Use Which
| Feature | numpy.loadtxt() | numpy.genfromtxt() |
|---|---|---|
| Speed | Faster | Slightly slower |
| Missing values | ❌ Raises an error | ✅ Fills with nan or custom value |
| Skip footer rows | ❌ Not supported | ✅ skip_footer parameter |
| Mixed data types | Limited | ✅ Structured arrays with dtype=None |
| Best for | Clean, well-formatted files | Messy, real-world datasets |
Use loadtxt() when you know your data is clean and complete. Use genfromtxt() when you are unsure about data quality or expect missing entries. If your dataset is very large and complex, consider using pandas.read_csv(), which offers even more flexibility and better performance for tabular data.
Common Mistakes and How to Fix Them
Mistake 1: Wrong Delimiter
# File uses commas, but no delimiter is specified
# ❌ loadtxt treats the entire line "1,2,3" as one value
data = np.loadtxt("data.csv")
# ValueError: could not convert string '1,2,3' to float64
Fix: Always specify the correct delimiter:
# ✅ Correct
data = np.loadtxt("data.csv", delimiter=",")
Mistake 2: Not Skipping Headers
# File has a text header: "Name,Age,Score"
# ❌ loadtxt tries to parse "Name" as a float
data = np.loadtxt("students.csv", delimiter=",")
# ValueError: could not convert string 'Name' to float64
Fix: Skip the header row:
# ✅ Skip the first row
data = np.loadtxt("students.csv", delimiter=",", skiprows=1)
Mistake 3: Using loadtxt() on Files with Missing Data
# ❌ Fails because of a missing value in row 2
data = np.loadtxt("incomplete.csv", delimiter=",")
Fix: Switch to genfromtxt():
# ✅ Handles missing values gracefully
data = np.genfromtxt("incomplete.csv", delimiter=",", filling_values=0)
Conclusion
NumPy provides two powerful functions for importing text files into arrays.
numpy.loadtxt()is fast and ideal for clean, consistently formatted numerical datanumpy.genfromtxt()offers the flexibility to handle missing values, mixed types, and irregular formatting found in real-world datasets.
By mastering parameters like delimiter, skiprows, usecols, comments, and filling_values, you can efficiently load virtually any structured text file into a NumPy array for fast numerical processing.