How to Load NumPy Data in TensorFlow Using Python

When building machine learning models with TensorFlow, your training data often starts as NumPy arrays: whether loaded from files, generated programmatically, or preprocessed with libraries like Pandas or scikit-learn. TensorFlow provides seamless integration with NumPy through its tf.data.Dataset API, allowing you to convert NumPy arrays into efficient, iterable dataset pipelines ready for model training.

In this guide, you will learn how to load NumPy data into TensorFlow using tf.data.Dataset.from_tensor_slices(), work with different array shapes, pair features with labels, and apply common dataset operations like batching and shuffling.

Using `tf.data.Dataset.from_tensor_slices()`

The primary method for loading NumPy data into TensorFlow is tf.data.Dataset.from_tensor_slices(). This function takes a NumPy array (or a tuple/dictionary of arrays) and creates a Dataset object where each element corresponds to a slice along the first dimension of the input.

Syntax

tf.data.Dataset.from_tensor_slices(tensors)

tensors: A NumPy array, a Python list, a TensorFlow tensor, or a tuple/dictionary of these types.
Returns: A tf.data.Dataset object that yields individual slices.

Loading a 2D NumPy Array

Each row of the array becomes a separate element in the dataset:

import tensorflow as tf
import numpy as np

# Create a 2D NumPy array
arr = np.array([
    [1, 2, 3, 4],
    [4, 5, 6, 0],
    [2, 0, 7, 8],
    [3, 7, 4, 2]
])

# Load into a TensorFlow Dataset
dataset = tf.data.Dataset.from_tensor_slices(arr)

# Iterate and print each element
for element in dataset:
    print(element.numpy())

Output:

[1 2 3 4]
[4 5 6 0]
[2 0 7 8]
[3 7 4 2]

The 4×4 array is sliced along the first axis (rows), producing 4 dataset elements, each a 1D array of length 4.

Loading a Python List

from_tensor_slices() also works directly with Python lists, which are internally converted to tensors:

import tensorflow as tf

data = [[5, 10], [3, 6], [1, 2], [5, 0]]

dataset = tf.data.Dataset.from_tensor_slices(data)

for element in dataset:
    print(element.numpy())

Output:

[ 5 10]
[3 6]
[1 2]
[5 0]

Loading Features and Labels Together

In machine learning, you typically have a features array and a corresponding labels array. Pass them as a tuple to create a paired dataset:

import tensorflow as tf
import numpy as np

# Feature data: 5 samples, 3 features each
features = np.array([
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0],
    [10.0, 11.0, 12.0],
    [13.0, 14.0, 15.0]
])

# Labels: one per sample
labels = np.array([0, 1, 0, 1, 1])

# Create a paired dataset
dataset = tf.data.Dataset.from_tensor_slices((features, labels))

for feature, label in dataset:
    print(f"Features: {feature.numpy()}, Label: {label.numpy()}")

Output:

Features: [1. 2. 3.], Label: 0
Features: [4. 5. 6.], Label: 1
Features: [7. 8. 9.], Label: 0
Features: [10. 11. 12.], Label: 1
Features: [13. 14. 15.], Label: 1

Each iteration yields a (feature_vector, label) pair: exactly the format that model.fit() expects.

Loading Data as a Dictionary

You can also pass a dictionary of arrays. This is useful when your model expects named inputs:

import tensorflow as tf
import numpy as np

data = {
    "temperature": np.array([22.5, 25.0, 19.8, 30.2]),
    "humidity": np.array([45, 60, 80, 35]),
    "label": np.array([0, 1, 1, 0])
}

dataset = tf.data.Dataset.from_tensor_slices(data)

for sample in dataset:
    print({key: val.numpy() for key, val in sample.items()})

Output:

{'temperature': 22.5, 'humidity': 45, 'label': 0}
{'temperature': 25.0, 'humidity': 60, 'label': 1}
{'temperature': 19.8, 'humidity': 80, 'label': 1}
{'temperature': 30.2, 'humidity': 35, 'label': 0}

Applying Dataset Operations

Once your data is in a tf.data.Dataset, you can chain operations to prepare it for training.

Shuffling, Batching, and Prefetching

import tensorflow as tf
import numpy as np

features = np.random.randn(100, 4).astype(np.float32)
labels = np.random.randint(0, 2, size=100).astype(np.int32)

# Create dataset pipeline
dataset = tf.data.Dataset.from_tensor_slices((features, labels))

# Shuffle, batch, and prefetch for optimal training performance
dataset = dataset.shuffle(buffer_size=100)  # Randomize order
dataset = dataset.batch(16)                  # Group into batches of 16
dataset = dataset.prefetch(tf.data.AUTOTUNE) # Overlap data loading with training

# Inspect one batch
for batch_features, batch_labels in dataset.take(1):
    print(f"Batch features shape: {batch_features.shape}")
    print(f"Batch labels shape: {batch_labels.shape}")

Output:

Batch features shape: (16, 4)
Batch labels shape: (16,)

:::tip[Best Practice]s for dataset pipelines

Shuffle before batching to ensure each batch has a diverse mix of samples.
Batch to group samples for efficient GPU utilization.
Prefetch with tf.data.AUTOTUNE to overlap data preprocessing with model training, reducing idle time.

# ✅ Recommended pipeline order
dataset = (
    tf.data.Dataset.from_tensor_slices((features, labels))
    .shuffle(buffer_size=len(features))
    .batch(32)
    .prefetch(tf.data.AUTOTUNE)
)

:::

Applying Transformations with `.map()`

Use .map() to apply preprocessing functions to each element:

import tensorflow as tf
import numpy as np

features = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
labels = np.array([0, 1, 0])

dataset = tf.data.Dataset.from_tensor_slices((features, labels))

# Normalize features by dividing by the maximum value
def normalize(feature, label):
    return feature / 6.0, label

dataset = dataset.map(normalize)

for feature, label in dataset:
    print(f"Normalized: {feature.numpy()}, Label: {label.numpy()}")

Output:

Normalized: [0.16666667 0.33333334], Label: 0
Normalized: [0.5        0.6666667 ], Label: 1
Normalized: [0.8333333 1.       ], Label: 0

Using the Dataset for Model Training

Here is a complete example showing how to load NumPy data into TensorFlow and train a simple model:

import tensorflow as tf
import numpy as np

# Generate sample data
np.random.seed(42)
X_train = np.random.randn(1000, 10).astype(np.float32)
y_train = np.random.randint(0, 2, size=1000).astype(np.int32)

# Create dataset pipeline
train_dataset = (
    tf.data.Dataset.from_tensor_slices((X_train, y_train))
    .shuffle(1000)
    .batch(32)
    .prefetch(tf.data.AUTOTUNE)
)

# Build a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train using the dataset
model.fit(train_dataset, epochs=3)

Output:

Epoch 1/3
32/32 [==============================] - 1s 2ms/step - loss: 0.7012 - accuracy: 0.5010
Epoch 2/3
32/32 [==============================] - 0s 2ms/step - loss: 0.6920 - accuracy: 0.5250
Epoch 3/3
32/32 [==============================] - 0s 2ms/step - loss: 0.6889 - accuracy: 0.5370

Common Mistakes and How to Avoid Them

Mistake 1: Mismatched Array Lengths in Tuples

# ❌ Features has 5 rows but labels has 4 elements
features = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
labels = np.array([0, 1, 0, 1])

dataset = tf.data.Dataset.from_tensor_slices((features, labels))
# ValueError: all input arrays must have the same first dimension

Fix: Ensure all arrays in the tuple have the same length along the first axis:

# ✅ Both have 5 elements
labels = np.array([0, 1, 0, 1, 1])
dataset = tf.data.Dataset.from_tensor_slices((features, labels))

Mistake 2: Forgetting to Batch Before Training

# ❌ Unbatched dataset: model.fit() will process one sample at a time
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
model.fit(dataset, epochs=5)  # Extremely slow

Fix: Always batch your dataset:

# ✅ Batched for efficient training
dataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(32)
model.fit(dataset, epochs=5)

warning

Using an unbatched dataset with model.fit() technically works, but it processes one sample per step, which is orders of magnitude slower than batched training and does not leverage GPU parallelism.

Conclusion

Loading NumPy data into TensorFlow is simple and efficient using tf.data.Dataset.from_tensor_slices(). This function handles 1D arrays, 2D matrices, tuples of feature-label pairs, and dictionaries of named inputs.

Once your data is in a tf.data.Dataset, you can chain operations like shuffle, batch, map, and prefetch to build optimized training pipelines.

This approach is the recommended way to feed NumPy data into TensorFlow models, offering better performance and memory management than passing raw arrays directly to model.fit().

Using tf.data.Dataset.from_tensor_slices()​

Syntax​

Loading a 2D NumPy Array​

Loading a Python List​

Loading Features and Labels Together​

Loading Data as a Dictionary​

Applying Dataset Operations​

Shuffling, Batching, and Prefetching​

Applying Transformations with .map()​

Using the Dataset for Model Training​

Common Mistakes and How to Avoid Them​

Mistake 1: Mismatched Array Lengths in Tuples​

Mistake 2: Forgetting to Batch Before Training​

Conclusion​

Table of Contents

Using `tf.data.Dataset.from_tensor_slices()`

Syntax

Loading a 2D NumPy Array

Loading a Python List

Loading Features and Labels Together

Loading Data as a Dictionary

Applying Dataset Operations

Shuffling, Batching, and Prefetching

Applying Transformations with `.map()`

Using the Dataset for Model Training

Common Mistakes and How to Avoid Them

Mistake 1: Mismatched Array Lengths in Tuples

Mistake 2: Forgetting to Batch Before Training

Conclusion