Python PyTorch: How to Use a DataLoader in PyTorch

When training deep learning models, loading an entire dataset into memory at once is often impractical - datasets can be gigabytes in size, and processing them sequentially is slow. PyTorch's DataLoader solves both problems by automatically batching, shuffling, and parallelizing the data loading process.

This guide explains how to create custom datasets, configure DataLoaders, and use them effectively in training loops.

What a DataLoader Does

A DataLoader wraps a dataset and provides an iterable that yields batches of data. Instead of manually slicing your data into batches and shuffling between epochs, the DataLoader handles this automatically:

Full Dataset (10,000 samples)
    ↓ DataLoader(batch_size=32, shuffle=True)
    ↓
Epoch 1: [Batch 1: 32 samples] → [Batch 2: 32 samples] → ... → [Batch 313: 8 samples]
Epoch 2: [Batch 1: 32 samples (different order)] → ...

DataLoader Syntax

from torch.utils.data import DataLoader

DataLoader(
    dataset,
    batch_size=1,
    shuffle=False,
    num_workers=0,
    drop_last=False,
    pin_memory=False
)

Parameter	Description	Default
`dataset`	The dataset to load (required)	Required
`batch_size`	Number of samples per batch	`1`
`shuffle`	Whether to randomize order each epoch	`False`
`num_workers`	Number of subprocesses for parallel loading	`0` (main process)
`drop_last`	Drop the last incomplete batch if the dataset isn't evenly divisible	`False`
`pin_memory`	Copy tensors to CUDA pinned memory for faster GPU transfer	`False`

Creating a Custom Dataset

To use a DataLoader, you first need a dataset. Custom datasets extend torch.utils.data.Dataset and must implement two methods:

__len__() - returns the total number of samples
__getitem__(index) - returns a single sample at the given index

import torch
from torch.utils.data import Dataset, DataLoader


class NumberDataset(Dataset):
    """A simple dataset containing numbers 0 to 99."""

    def __init__(self):
        self.data = list(range(100))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.data[index]


dataset = NumberDataset()
print(f"Dataset size: {len(dataset)}")
print(f"Sample at index 5: {dataset[5]}")

Output:

Dataset size: 100
Sample at index 5: 5

Using the DataLoader

Wrap the dataset in a DataLoader and iterate over it to get batches:

import torch
from torch.utils.data import Dataset, DataLoader


class NumberDataset(Dataset):
    def __init__(self):
        self.data = list(range(100))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.data[index]


dataset = NumberDataset()
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

# Print the first 3 batches
for i, batch in enumerate(dataloader):
    if i >= 3:
        break
    print(f"Batch {i}: {batch}")

print(f"\nTotal batches: {len(dataloader)}")

Output (varies due to shuffling):

Batch 0: tensor([56, 84, 42,  4, 66, 27, 99, 18, 20, 89])
Batch 1: tensor([ 7, 30, 74, 57, 10,  6, 28, 77,  0, 50])
Batch 2: tensor([32, 22, 73, 97, 26, 98, 85, 17,  8, 16])

Total batches: 10

The DataLoader automatically divides the 100 samples into 10 batches of 10, shuffled randomly.

Dataset with Features and Labels

Most real-world datasets have both input features and target labels. Return them as a tuple from __getitem__():

import torch
from torch.utils.data import Dataset, DataLoader


class RegressionDataset(Dataset):
    """Simple dataset with input features and target values."""

    def __init__(self, num_samples=200):
        self.X = torch.randn(num_samples, 3)  # 3 features
        self.y = self.X.sum(dim=1) + torch.randn(num_samples) * 0.1  # Target with noise

    def __len__(self):
        return len(self.X)

    def __getitem__(self, index):
        return self.X[index], self.y[index]


dataset = RegressionDataset(200)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Get one batch
features, targets = next(iter(dataloader))
print(f"Features shape: {features.shape}")
print(f"Targets shape:  {targets.shape}")

Output:

Features shape: torch.Size([32, 3])
Targets shape:  torch.Size([32])

Each batch contains 32 samples with 3 features each, along with their corresponding target values.

Using DataLoader with Built-in Datasets

PyTorch and related libraries provide many ready-to-use datasets. Here's how to use a DataLoader with TensorDataset:

import torch
from torch.utils.data import DataLoader, TensorDataset

# Create tensors from data
features = torch.randn(150, 4)          # 150 samples, 4 features
labels = torch.randint(0, 3, (150,))    # 3 classes

# Wrap in TensorDataset
dataset = TensorDataset(features, labels)

# Create DataLoader
dataloader = DataLoader(dataset, batch_size=16, shuffle=True)

for batch_features, batch_labels in dataloader:
    print(f"Features: {batch_features.shape}, Labels: {batch_labels.shape}")
    break  # Just show the first batch

Output:

Features: torch.Size([16, 4]), Labels: torch.Size([16])

tip

TensorDataset is a convenient wrapper when your data is already in tensor form. It automatically pairs corresponding elements from multiple tensors when batching.

Using DataLoader in a Training Loop

Here's how DataLoaders are typically used in a model training loop:

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

# Create a simple dataset
X = torch.randn(500, 10)
y = torch.randint(0, 2, (500,)).float()
dataset = TensorDataset(X, y)

# Split into train and validation sets
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Simple model
model = nn.Linear(10, 1)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(3):
    model.train()
    total_loss = 0

    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        predictions = model(batch_X).squeeze()
        loss = criterion(predictions, batch_y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    avg_loss = total_loss / len(train_loader)
    print(f"Epoch {epoch + 1}, Average Loss: {avg_loss:.4f}")

Output:

Epoch 1, Average Loss: 0.7414
Epoch 2, Average Loss: 0.7155
Epoch 3, Average Loss: 0.7027

Key DataLoader Configuration Options

Shuffling

Always shuffle training data to prevent the model from learning the order of samples:

# Training: shuffle to randomize order each epoch
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Validation/Testing: no need to shuffle
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

Parallel Data Loading with `num_workers`

Speed up data loading by using multiple subprocesses:

dataloader = DataLoader(dataset, batch_size=32, num_workers=4)

warning

On Windows, multi-worker DataLoaders must be created inside an if __name__ == '__main__': block to avoid spawning errors. On macOS, you may also need to set the multiprocessing start method: torch.multiprocessing.set_start_method('fork').

Dropping the Last Incomplete Batch

If your dataset size isn't evenly divisible by the batch size, the last batch will be smaller. Use drop_last=True to discard it:

import torch
from torch.utils.data import Dataset, DataLoader


class NumberDataset(Dataset):
    def __init__(self):
        self.data = list(range(100))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.data[index]


dataset = NumberDataset()               # 100 samples
dataloader = DataLoader(dataset, batch_size=32, drop_last=True)
print(f"Batches: {len(dataloader)}")    # 3 batches of 32, last 4 samples dropped

Output:

Batches: 3

GPU Memory Optimization with `pin_memory`

When training on GPU, enable pin_memory for faster host-to-device transfers:

dataloader = DataLoader(dataset, batch_size=32, pin_memory=True)

for batch_X, batch_y in dataloader:
    batch_X = batch_X.to('cuda', non_blocking=True)
    batch_y = batch_y.to('cuda', non_blocking=True)

Common Mistake: Forgetting to Set `shuffle=True` for Training

Training without shuffling can cause the model to learn patterns from the data ordering rather than the data itself:

# WRONG: no shuffling during training, model may learn order patterns
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)

# CORRECT: always shuffle training data
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

info

Always set shuffle=True for training DataLoaders. For validation and test DataLoaders, shuffling is unnecessary since you're only evaluating, not learning from the data order.

Quick Reference

Configuration	Training	Validation/Testing
`shuffle`	`True`	`False`
`batch_size`	16–128 (experiment)	Same or larger
`num_workers`	2–8 (depends on CPU)	Same as training
`drop_last`	Optional (`True` for BatchNorm stability)	`False`
`pin_memory`	`True` (if using GPU)	`True` (if using GPU)

The DataLoader is the backbone of efficient data handling in PyTorch. By configuring batch size, shuffling, and parallel workers, you can significantly speed up training while keeping memory usage under control. Combined with a well-structured Dataset class, it provides a clean and scalable pipeline for feeding data to your models.

What a DataLoader Does​

DataLoader Syntax​

Creating a Custom Dataset​

Using the DataLoader​

Dataset with Features and Labels​

Using DataLoader with Built-in Datasets​

Using DataLoader in a Training Loop​

Key DataLoader Configuration Options​

Shuffling​

Parallel Data Loading with num_workers​

Dropping the Last Incomplete Batch​

GPU Memory Optimization with pin_memory​

Common Mistake: Forgetting to Set shuffle=True for Training​

Quick Reference​

Table of Contents

What a DataLoader Does

DataLoader Syntax

Creating a Custom Dataset

Using the DataLoader

Dataset with Features and Labels

Using DataLoader with Built-in Datasets

Using DataLoader in a Training Loop

Key DataLoader Configuration Options

Shuffling

Parallel Data Loading with `num_workers`

Dropping the Last Incomplete Batch

GPU Memory Optimization with `pin_memory`

Common Mistake: Forgetting to Set `shuffle=True` for Training

Quick Reference