Skip to main content

Python PyTorch: How to Upsample a Tensor in PyTorch

Upsampling is the process of increasing the spatial resolution of a tensor - transforming a low-resolution input into a higher-resolution output. This operation is essential in deep learning for tasks like image super-resolution, semantic segmentation (where decoder networks need to restore spatial dimensions), generative models, and signal processing. PyTorch provides multiple approaches for upsampling, each with different characteristics.

This guide covers the three main methods: torch.nn.Upsample, torch.nn.functional.interpolate, and torch.nn.ConvTranspose2d.

Understanding Upsampling Modes

Before diving into the code, here are the key interpolation modes available:

ModeInput DimensionsDescription
nearest1D, 2D, 3DCopies the nearest pixel value. Fast but produces blocky results
linear1D onlyLinear interpolation between neighboring values
bilinear2D onlyWeighted average of 4 nearest pixels. Smooth results
bicubic2D onlyWeighted average of 16 nearest pixels. Smoothest results
trilinear3D onlyInterpolation across 3 spatial dimensions
areaAnyAdaptive average pooling in reverse. Good for integer scale factors

Method 1: Using torch.nn.Upsample

torch.nn.Upsample is a module-based approach that can be added as a layer in a neural network. It supports various interpolation modes and can specify either a target size or a scale factor.

Syntax

torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)
ParameterDescription
sizeTarget output size (tuple or int)
scale_factorMultiplier for the spatial dimensions
modeInterpolation algorithm
align_cornersAligns corner pixels of input and output (for linear, bilinear, bicubic, trilinear modes)

Upsampling a 1D Tensor

import torch

# Create a 1D tensor: (batch=1, channels=1, length=2)
x = torch.tensor([1., 2.]).view(1, 1, 2)
print("Input shape:", x.shape)
print("Input:", x)

# Nearest neighbor: scale by 2
upsample_nearest = torch.nn.Upsample(scale_factor=2, mode='nearest')
output = upsample_nearest(x)
print("\nNearest (2x):", output)

# Linear interpolation: scale by 2
upsample_linear = torch.nn.Upsample(scale_factor=2, mode='linear')
output = upsample_linear(x)
print("Linear (2x): ", output)

Output:

Input shape: torch.Size([1, 1, 2])
Input: tensor([[[1., 2.]]])

Nearest (2x): tensor([[[1., 1., 2., 2.]]])
Linear (2x): tensor([[[1.0000, 1.2500, 1.7500, 2.0000]]])

With nearest mode, each value is simply duplicated. With linear mode, intermediate values are interpolated smoothly between neighbors.

Upsampling a 2D Tensor

For 2D data (like images), the input must have shape (batch, channels, height, width):

import torch

# 2D tensor: (batch=1, channels=1, height=2, width=3)
x = torch.tensor([
[1., 2., 3.],
[4., 5., 6.]
]).view(1, 1, 2, 3)

print("Input shape:", x.shape)

# Nearest neighbor upsampling
upsample = torch.nn.Upsample(scale_factor=2, mode='nearest')
output = upsample(x)
print("\nNearest (2x) shape:", output.shape)
print(output)

# Bilinear upsampling
upsample_bi = torch.nn.Upsample(scale_factor=2, mode='bilinear')
output_bi = upsample_bi(x)
print("\nBilinear (2x) shape:", output_bi.shape)
print(output_bi)

Output:

Input shape: torch.Size([1, 1, 2, 3])

Nearest (2x) shape: torch.Size([1, 1, 4, 6])
tensor([[[[1., 1., 2., 2., 3., 3.],
[1., 1., 2., 2., 3., 3.],
[4., 4., 5., 5., 6., 6.],
[4., 4., 5., 5., 6., 6.]]]])

Bilinear (2x) shape: torch.Size([1, 1, 4, 6])
tensor([[[[1.0000, 1.2500, 1.7500, 2.2500, 2.7500, 3.0000],
[1.7500, 2.0000, 2.5000, 3.0000, 3.5000, 3.7500],
[3.2500, 3.5000, 4.0000, 4.5000, 5.0000, 5.2500],
[4.0000, 4.2500, 4.7500, 5.2500, 5.7500, 6.0000]]]])

Bilinear interpolation produces smoother transitions between values compared to nearest neighbor.

Method 2: Using torch.nn.functional.interpolate

This is the functional version of upsampling - a standalone function rather than a module. It provides the same functionality but is used directly in the forward pass without being registered as a layer.

Syntax

torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)

Example

import torch
import torch.nn.functional as F

x = torch.tensor([
[1., 2., 3.],
[4., 5., 6.]
]).view(1, 1, 2, 3)

print("Input shape:", x.shape)

# Bilinear upsampling by factor of 2
output = F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
print("Output shape:", output.shape)
print(output)

Output:

Input shape: torch.Size([1, 1, 2, 3])
Output shape: torch.Size([1, 1, 4, 6])
tensor([[[[1.0000, 1.2500, 1.7500, 2.2500, 2.7500, 3.0000],
[1.7500, 2.0000, 2.5000, 3.0000, 3.5000, 3.7500],
[3.2500, 3.5000, 4.0000, 4.5000, 5.0000, 5.2500],
[4.0000, 4.2500, 4.7500, 5.2500, 5.7500, 6.0000]]]])

Specifying Target Size Instead of Scale Factor

You can specify the exact output size instead of a scale factor:

import torch
import torch.nn.functional as F

x = torch.randn(1, 1, 4, 4)
print("Input shape:", x.shape)

# Upsample to a specific size (8x12)
output = F.interpolate(x, size=(8, 12), mode='bilinear', align_corners=False)
print("Output shape:", output.shape)

Output:

Input shape: torch.Size([1, 1, 4, 4])
Output shape: torch.Size([1, 1, 8, 12])
tip

Use torch.nn.functional.interpolate when you need upsampling as a one-off operation inside a forward() method. Use torch.nn.Upsample when you want to include it as a named layer in an nn.Sequential model.

Method 3: Using torch.nn.ConvTranspose2d (Learnable Upsampling)

Unlike the previous methods that use fixed interpolation formulas, transposed convolution (also called deconvolution) uses learnable filters to upsample. This means the network learns how to best upsample during training, which can produce better results for specific tasks.

Syntax

torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0)

Example

import torch

# Input: batch=1, 3 channels, 2x4 spatial
x = torch.randn(1, 3, 2, 4)
print("Input shape:", x.shape)

# Transposed convolution: upsample by factor of 2
transposed_conv = torch.nn.ConvTranspose2d(
in_channels=3,
out_channels=3,
kernel_size=3,
stride=2,
padding=1,
output_padding=1
)

output = transposed_conv(x)
print("Output shape:", output.shape)

Output:

Input shape: torch.Size([1, 3, 2, 4])
Output shape: torch.Size([1, 3, 4, 8])

The spatial dimensions are doubled: 2→4 (height) and 4→8 (width).

info

ConvTranspose2d is the only upsampling method that has learnable parameters. It is commonly used in decoder networks (U-Net, autoencoders, GANs) where the upsampling behavior should be optimized during training. However, it can sometimes produce checkerboard artifacts in the output.

Common Mistake: Wrong Input Dimensions

A frequent error is passing a tensor without the batch and channel dimensions:

import torch

# WRONG: 2D tensor without batch and channel dimensions
x = torch.tensor([[1., 2.], [3., 4.]])

upsample = torch.nn.Upsample(scale_factor=2)
try:
output = upsample(x)
except RuntimeError as e:
print(f"Error: {e}")

Output:

Error: Input Error: Only 3D, 4D and 5D input Tensors supported (got 2D) for the modes: nearest | linear | bilinear | bicubic | trilinear | area | nearest-exact (got nearest)

The correct approach - add batch and channel dimensions:

import torch

# CORRECT: reshape to (batch, channels, height, width)
x = torch.tensor([[1., 2.], [3., 4.]]).view(1, 1, 2, 2)

upsample = torch.nn.Upsample(scale_factor=2)
output = upsample(x)
print("Output shape:", output.shape)

Output:

Output shape: torch.Size([1, 1, 4, 4])
danger

Upsample and interpolate require tensors with at least 3 dimensions: (batch, channels, spatial...). A raw 2D matrix must be reshaped with .view(1, 1, H, W) or .unsqueeze(0).unsqueeze(0) before upsampling.

Method Comparison

MethodLearnableUse CaseProsCons
nn.UpsampleNoFixed upsampling in nn.SequentialSimple, no parameters to trainFixed interpolation only
F.interpolateNoFunctional upsampling in forward()Flexible, supports target sizeSame as Upsample, functional style
nn.ConvTranspose2dYesDecoder networks (U-Net, GANs)Learns optimal upsamplingMore parameters, possible checkerboard artifacts

For most applications, F.interpolate with bilinear mode provides a good balance of quality and simplicity. Use ConvTranspose2d when you need the model to learn the upsampling strategy, and be aware of potential artifact issues.