Skip to main content

Python NumPy: How to Import Text Files Into NumPy Arrays in Python

Loading data from text files into NumPy arrays is one of the most common tasks in data analysis, scientific computing, and machine learning. Whether you're working with CSV files, tab-separated data, or custom-formatted text files, NumPy provides powerful built-in functions that can read and convert structured text data into arrays with just a single line of code.

The two primary functions for this task are numpy.loadtxt() and numpy.genfromtxt(). While loadtxt() is optimized for clean, well-formatted data, genfromtxt() is designed to handle messy, real-world datasets with missing values and mixed data types.

In this guide, you will learn how to use both functions with practical examples covering delimiters, skipping rows, selecting columns, handling comments, and dealing with missing data.

Using numpy.loadtxt() - Fast and Simple

numpy.loadtxt() is the go-to function for loading clean, consistently formatted numerical data from text files. It is fast, efficient, and works well with CSV, TSV, and space-separated files.

Syntax

numpy.loadtxt(
fname, # File name or path
delimiter=None, # Character separating values (default: whitespace)
dtype=float, # Data type of the resulting array
skiprows=0, # Number of initial rows to skip
usecols=None, # Specific columns to read
comments='#', # Character indicating comment lines
max_rows=None # Maximum number of rows to read
)

Loading a Simple Space-Separated Text File

Consider a file example.txt with space-separated integers:

1 2
3 4
5 6
7 8
9 10
import numpy as np

data = np.loadtxt("example.txt", dtype=int)

print("Loaded Data:")
print(data)

Output:

Loaded Data:
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]]

Since the file uses whitespace as the default separator, no delimiter parameter is needed. Setting dtype=int ensures all values are loaded as integers rather than the default floats.

Loading a CSV File Using a Delimiter

When values are separated by commas, specify delimiter=",". Consider a file data.csv:

1,2,3
4,5,6
7,8,9
import numpy as np

data = np.loadtxt("data.csv", delimiter=",")

print("CSV Data:")
print(data)

Output:

CSV Data:
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
tip

The default dtype is float, which is why the output shows 1. instead of 1. If your data is strictly integers, pass dtype=int to avoid unnecessary floating-point conversion.

Skipping Comment Lines

Many data files include comment lines (metadata, descriptions, or headers) prefixed with a special character. The comments parameter tells loadtxt() to ignore these lines. Consider a file commented_data.csv:

# This is a header comment
# Generated on 2025-01-15
1,2,3
4,5,6
7,8,9
import numpy as np

data = np.loadtxt("commented_data.csv", delimiter=",", comments="#")

print("Data without comments:")
print(data)

Output:

Data without comments:
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]

By default, comments='#' is already set, so lines starting with # are automatically skipped even without explicitly specifying the parameter.

Skipping Initial Rows

Some files have metadata or headers in the first few rows that are not comment lines. Use skiprows to skip them. Consider a file with_metadata.csv:

Dataset: Temperature Readings
Date: 2025-01-15
Unit: Celsius
23.5,24.1,22.8
25.0,24.7,23.9
22.1,23.3,24.5
import numpy as np

data = np.loadtxt("with_metadata.csv", delimiter=",", skiprows=3)

print("Data after skipping metadata:")
print(data)

Output:

Data after skipping metadata:
[[23.5 24.1 22.8]
[25. 24.7 23.9]
[22.1 23.3 24.5]]

Selecting Specific Columns

When you only need certain columns from a file, use usecols to avoid loading unnecessary data. Consider a file people.txt:

ID Name Score
1 Ankit 85
2 Bunty 92
3 Tinku 78
4 Rina 95
5 Rajesh 88

To extract only the names (column index 1):

import numpy as np

names = np.loadtxt("people.txt", usecols=1, skiprows=1, dtype=str)

for name in names:
print(name)

Output:

Ankit
Bunty
Tinku
Rina
Rajesh

To extract multiple columns, pass a tuple:

# Load columns 0 and 2 (ID and Score)
data = np.loadtxt("people.txt", usecols=(0, 2), skiprows=1, dtype=int)
print(data)

Output:

[[ 1 85]
[ 2 92]
[ 3 78]
[ 4 95]
[ 5 88]]
loadtxt() fails on missing or inconsistent data

If your file has missing values, blank entries, or inconsistent row lengths, loadtxt() will raise a ValueError. For such cases, use genfromtxt() instead.

# ❌ This will fail if the file has missing values
data = np.loadtxt("messy_data.csv", delimiter=",")
# ValueError: could not convert string '' to float64

Using numpy.genfromtxt() - Flexible and Robust

numpy.genfromtxt() is a more versatile alternative that can handle missing values, mixed data types, and irregular file formats. It is ideal for real-world datasets that are not perfectly clean.

Loading String Data

Consider a file words.txt with comma-separated text:

a,b,c,d
e,f,g,h
import numpy as np

data = np.genfromtxt("words.txt", dtype=str, delimiter=",", encoding="utf-8")

print(data)

Output:

[['a' 'b' 'c' 'd']
['e' 'f' 'g' 'h']]

Unlike loadtxt(), genfromtxt() supports skipping lines at the end of a file using skip_footer. Consider a file with_footer.txt:

This is TutorialReference Website
How are You Ready?
tutorial reference . com
--- End of File ---
import numpy as np

data = np.genfromtxt("with_footer.txt", dtype=str, skip_footer=1)

print(data)

Output:

[['This' 'is' 'TutorialReference' 'Website']
['How' 'are' 'You' 'Ready?']
['tutorial' 'reference' '.' 'com']]

Handling Missing Values Automatically

This is where genfromtxt() truly shines. Consider a file incomplete.csv with a missing value:

1,2,3
4,,6
7,8,9
import numpy as np

data = np.genfromtxt("incomplete.csv", delimiter=",", dtype=float)

print(data)

Output:

[[ 1.  2.  3.]
[ 4. nan 6.]
[ 7. 8. 9.]]

The missing value is automatically replaced with nan (Not a Number), allowing the rest of the data to load without errors.

Replacing Missing Values with a Custom Default

Instead of nan, you can fill missing entries with a specific value using the filling_values parameter:

import numpy as np

data = np.genfromtxt(
"incomplete.csv",
delimiter=",",
dtype=float,
filling_values=0 # Replace missing values with 0
)

print(data)

Output:

[[1. 2. 3.]
[4. 0. 6.]
[7. 8. 9.]]

This is especially useful when missing data could cause errors in downstream calculations or machine learning models.

loadtxt() vs. genfromtxt() - When to Use Which

Featurenumpy.loadtxt()numpy.genfromtxt()
SpeedFasterSlightly slower
Missing values❌ Raises an error✅ Fills with nan or custom value
Skip footer rows❌ Not supportedskip_footer parameter
Mixed data typesLimited✅ Structured arrays with dtype=None
Best forClean, well-formatted filesMessy, real-world datasets
Rule of thumb

Use loadtxt() when you know your data is clean and complete. Use genfromtxt() when you are unsure about data quality or expect missing entries. If your dataset is very large and complex, consider using pandas.read_csv(), which offers even more flexibility and better performance for tabular data.

Common Mistakes and How to Fix Them

Mistake 1: Wrong Delimiter

# File uses commas, but no delimiter is specified
# ❌ loadtxt treats the entire line "1,2,3" as one value
data = np.loadtxt("data.csv")
# ValueError: could not convert string '1,2,3' to float64

Fix: Always specify the correct delimiter:

# ✅ Correct
data = np.loadtxt("data.csv", delimiter=",")

Mistake 2: Not Skipping Headers

# File has a text header: "Name,Age,Score"
# ❌ loadtxt tries to parse "Name" as a float
data = np.loadtxt("students.csv", delimiter=",")
# ValueError: could not convert string 'Name' to float64

Fix: Skip the header row:

# ✅ Skip the first row
data = np.loadtxt("students.csv", delimiter=",", skiprows=1)

Mistake 3: Using loadtxt() on Files with Missing Data

# ❌ Fails because of a missing value in row 2
data = np.loadtxt("incomplete.csv", delimiter=",")

Fix: Switch to genfromtxt():

# ✅ Handles missing values gracefully
data = np.genfromtxt("incomplete.csv", delimiter=",", filling_values=0)

Conclusion

NumPy provides two powerful functions for importing text files into arrays.

  • numpy.loadtxt() is fast and ideal for clean, consistently formatted numerical data
  • numpy.genfromtxt() offers the flexibility to handle missing values, mixed types, and irregular formatting found in real-world datasets.

By mastering parameters like delimiter, skiprows, usecols, comments, and filling_values, you can efficiently load virtually any structured text file into a NumPy array for fast numerical processing.