Python PyTorch: How to Perform Element-Wise Subtraction on Tensors

Element-wise subtraction is a fundamental tensor operation used throughout deep learning - from computing residual connections in ResNets to calculating loss gradients, difference features, and error signals between predicted and actual values. PyTorch provides the torch.sub() function to subtract corresponding elements of two tensors efficiently.

In this guide, you'll learn how to perform element-wise subtraction using torch.sub() and the - operator, leverage the alpha parameter for scaled subtraction, handle broadcasting with different tensor shapes, and avoid common mistakes.

Understanding `torch.sub()`

PyTorch provides the torch.sub() function for element-wise subtraction:

torch.sub(input, other, *, alpha=1, out=None)

Parameters:

Parameter	Description
`input`	The first input tensor (minuend)
`other`	The tensor or scalar to subtract (subtrahend)
`alpha`	Optional multiplier applied to `other` before subtraction (default: `1`)
`out`	Optional pre-allocated output tensor

Returns: A new tensor computed as input - alpha × other.

Subtracting Two 1D Tensors

The simplest case subtracts corresponding elements of two tensors with the same shape:

import torch

tens_1 = torch.Tensor([10, 20, 30, 40, 50])
tens_2 = torch.Tensor([1, 2, 3, 4, 5])

result = torch.sub(tens_1, tens_2)

print("Tensor 1:", tens_1)
print("Tensor 2:", tens_2)
print("Result:  ", result)

Output:

Tensor 1: tensor([10., 20., 30., 40., 50.])
Tensor 2: tensor([1., 2., 3., 4., 5.])
Result:   tensor([ 9., 18., 27., 36., 45.])

Each element at index i in tens_2 is subtracted from the element at the same index in tens_1: 10-1=9, 20-2=18, 30-3=27, and so on.

You can also use the - operator, which produces identical results:

result = tens_1 - tens_2
print("Result:", result)

Output:

Result: tensor([ 9., 18., 27., 36., 45.])

Subtracting Two 2D Tensors

Element-wise subtraction works the same way with multi-dimensional tensors - each element at position [i][j] in the second tensor is subtracted from the element at the same position in the first:

import torch

tens_1 = torch.Tensor([[10, 20],
                        [30, 40]])
tens_2 = torch.Tensor([[1, 2],
                        [3, 4]])

result = torch.sub(tens_1, tens_2)

print("Tensor 1:")
print(tens_1)
print("\nTensor 2:")
print(tens_2)
print("\nResult:")
print(result)

Output:

Tensor 1:
tensor([[10., 20.],
        [30., 40.]])

Tensor 2:
tensor([[1., 2.],
        [3., 4.]])

Result:
tensor([[ 9., 18.],
        [27., 36.]])

Subtracting a Scalar from a Tensor

You can subtract a single number from every element of a tensor:

import torch

tens = torch.Tensor([100, 200, 300, 400, 500])

result = torch.sub(tens, 50)

print("Original:", tens)
print("- 50:    ", result)

Output:

Original: tensor([100., 200., 300., 400., 500.])
- 50:     tensor([ 50., 150., 250., 350., 450.])

This also works with multi-dimensional tensors:

import torch

tens = torch.Tensor([[10, 20], [30, 40]])
result = torch.sub(tens, 5)

print("Original:")
print(tens)
print("\n- 5:")
print(result)

Output:

Original:
tensor([[10., 20.],
        [30., 40.]])

- 5:
tensor([[ 5., 15.],
        [25., 35.]])

Using the `alpha` Parameter for Scaled Subtraction

The alpha parameter multiplies the second tensor before subtracting it. This computes input - alpha × other, which is particularly useful for gradient descent updates and weighted differences:

import torch

tens_1 = torch.Tensor([100, 200, 300])
tens_2 = torch.Tensor([1, 2, 3])

# Computes: tens_1 - 10 * tens_2
result = torch.sub(tens_1, tens_2, alpha=10)

print("Tensor 1:", tens_1)
print("Tensor 2:", tens_2)
print("Result (input - 10 × other):", result)

Output:

Tensor 1: tensor([100., 200., 300.])
Tensor 2: tensor([1., 2., 3.])
Result (input - 10 × other): tensor([ 90., 180., 270.])

Practical use of alpha

The alpha parameter is commonly used in optimization. For example, a simple gradient descent update can be expressed as:

# weights = weights - learning_rate * gradients
weights = torch.sub(weights, gradients, alpha=learning_rate)

This is more efficient than weights - learning_rate * gradients because it avoids creating an intermediate tensor for the multiplication.

Broadcasting: Subtracting Tensors of Different Shapes

PyTorch supports broadcasting, which allows operations on tensors with different dimensions. The smaller tensor is automatically expanded to match the larger one.

2D Tensor - 1D Tensor

import torch

tens_2d = torch.Tensor([[100, 200],
                         [300, 400]])
tens_1d = torch.Tensor([10, 20])

result = torch.sub(tens_2d, tens_1d)

print("2D Tensor:")
print(tens_2d)
print("\n1D Tensor:", tens_1d)
print("\nResult:")
print(result)

Output:

2D Tensor:
tensor([[100., 200.],
        [300., 400.]])

1D Tensor: tensor([10., 20.])

Result:
tensor([[ 90., 180.],
        [290., 380.]])

The 1D tensor [10, 20] is broadcast across each row: first row becomes [100-10, 200-20] and second row becomes [300-10, 400-20].

2D Tensor - Column Vector

import torch

a = torch.tensor([[10.0, 20.0, 30.0],
                   [40.0, 50.0, 60.0]])

# Column vector with shape (2, 1)
b = torch.tensor([[5.0],
                   [10.0]])

result = torch.sub(a, b)

print("Tensor a shape:", a.shape)
print("Tensor b shape:", b.shape)
print("\nResult:")
print(result)

Output:

Tensor a shape: torch.Size([2, 3])
Tensor b shape: torch.Size([2, 1])

Result:
tensor([[ 5., 15., 25.],
        [30., 40., 50.]])

How Broadcasting Works

Broadcasting follows these rules:

If tensors have different numbers of dimensions, the smaller tensor's shape is padded with 1s on the left.
Dimensions with size 1 are stretched to match the other tensor.
If two dimensions differ and neither is 1, the operation fails with an error.

For example, shapes (2, 3) and (3,) are compatible, but (2, 3) and (4,) are not.

Practical Example: Computing Prediction Errors

A common real-world application of element-wise subtraction is computing the error between predicted and actual values:

import torch

# Predicted values from a model
predictions = torch.tensor([2.5, 0.0, 2.1, 7.8])

# Actual ground truth values
targets = torch.tensor([3.0, -0.5, 2.0, 7.5])

# Compute errors
errors = torch.sub(predictions, targets)

# Compute absolute errors
abs_errors = torch.abs(errors)

# Mean Absolute Error
mae = torch.mean(abs_errors)

print("Predictions:", predictions)
print("Targets:    ", targets)
print("Errors:     ", errors)
print("Abs Errors: ", abs_errors)
print(f"MAE:         {mae.item():.4f}")

Output:

Predictions: tensor([2.5000, 0.0000, 2.1000, 7.8000])
Targets:     tensor([ 3.0000, -0.5000,  2.0000,  7.5000])
Errors:      tensor([-0.5000,  0.5000,  0.1000,  0.3000])
Abs Errors:  tensor([0.5000, 0.5000, 0.1000, 0.3000])
MAE:         0.3500

Common Mistake: Shape Mismatch

Element-wise subtraction requires tensors to have compatible shapes. If the shapes can't be broadcast, PyTorch raises an error:

import torch

a = torch.tensor([1.0, 2.0, 3.0])       # Shape: (3,)
b = torch.tensor([10.0, 20.0, 30.0, 40.0])  # Shape: (4,)

# ❌ Shapes (3,) and (4,) are not compatible
try:
    result = torch.sub(a, b)
except RuntimeError as e:
    print(f"Error: {e}")

Output:

Error: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0

Fix: Ensure your tensors have matching or broadcastable shapes before subtracting:

# ✅ Slice to matching size
result = torch.sub(a, b[:3])
print("Result:", result)

Output:

Result: tensor([ -9., -18., -27.])

In-Place Subtraction with `sub_()`

To modify a tensor in place without allocating new memory, use sub_():

import torch

tens = torch.tensor([100.0, 200.0, 300.0])
print("Before:", tens)

tens.sub_(25)
print("After: ", tens)

Output:

Before: tensor([100., 200., 300.])
After:  tensor([ 75., 175., 275.])

warning

Avoid in-place operations on tensors that require gradient computation. They can break PyTorch's autograd computation graph:

x = torch.tensor([10.0, 20.0], requires_grad=True)
x.sub_(5)  # RuntimeError: a leaf Variable that requires grad is being used in an in-place operation

Use the standard torch.sub() or - operator instead when working with gradient-tracked tensors.

`torch.sub()` vs `-` Operator

Feature	`torch.sub()`	`-` Operator
Basic subtraction	✅	✅
`alpha` parameter (scaled subtraction)	✅	❌
`out` parameter (pre-allocated output)	✅	❌
Readability	Explicit	Concise

Use torch.sub() when you need the alpha or out parameters. Use - for simple, readable subtraction.

Summary

Element-wise subtraction in PyTorch is performed with torch.sub() or the - operator. Both subtract corresponding elements and support broadcasting for tensors with different shapes. Key points to remember:

Use torch.sub() for explicit subtraction with optional alpha scaling.
Use the - operator for concise, readable code.
The alpha parameter enables efficient scaled subtraction (input - alpha × other), useful for gradient updates.
Broadcasting automatically expands smaller tensors to match larger ones when shapes are compatible.
Avoid in-place operations (sub_()) on tensors that participate in gradient computation.
Always verify that tensor shapes are compatible to prevent runtime errors.

Understanding torch.sub()​

Subtracting Two 1D Tensors​

Subtracting Two 2D Tensors​

Subtracting a Scalar from a Tensor​

Using the alpha Parameter for Scaled Subtraction​

Broadcasting: Subtracting Tensors of Different Shapes​

2D Tensor - 1D Tensor​

2D Tensor - Column Vector​

Practical Example: Computing Prediction Errors​

Common Mistake: Shape Mismatch​

In-Place Subtraction with sub_()​

torch.sub() vs - Operator​

Summary​

Table of Contents

Understanding `torch.sub()`

Subtracting Two 1D Tensors

Subtracting Two 2D Tensors

Subtracting a Scalar from a Tensor

Using the `alpha` Parameter for Scaled Subtraction

Broadcasting: Subtracting Tensors of Different Shapes

2D Tensor - 1D Tensor

2D Tensor - Column Vector

Practical Example: Computing Prediction Errors

Common Mistake: Shape Mismatch

In-Place Subtraction with `sub_()`

`torch.sub()` vs `-` Operator

Summary