Python PyTorch: How to Perform Element-Wise Subtraction on Tensors
Element-wise subtraction is a fundamental tensor operation used throughout deep learning - from computing residual connections in ResNets to calculating loss gradients, difference features, and error signals between predicted and actual values. PyTorch provides the torch.sub() function to subtract corresponding elements of two tensors efficiently.
In this guide, you'll learn how to perform element-wise subtraction using torch.sub() and the - operator, leverage the alpha parameter for scaled subtraction, handle broadcasting with different tensor shapes, and avoid common mistakes.
Understanding torch.sub()
PyTorch provides the torch.sub() function for element-wise subtraction:
torch.sub(input, other, *, alpha=1, out=None)
Parameters:
| Parameter | Description |
|---|---|
input | The first input tensor (minuend) |
other | The tensor or scalar to subtract (subtrahend) |
alpha | Optional multiplier applied to other before subtraction (default: 1) |
out | Optional pre-allocated output tensor |
Returns: A new tensor computed as input - alpha × other.
Subtracting Two 1D Tensors
The simplest case subtracts corresponding elements of two tensors with the same shape:
import torch
tens_1 = torch.Tensor([10, 20, 30, 40, 50])
tens_2 = torch.Tensor([1, 2, 3, 4, 5])
result = torch.sub(tens_1, tens_2)
print("Tensor 1:", tens_1)
print("Tensor 2:", tens_2)
print("Result: ", result)
Output:
Tensor 1: tensor([10., 20., 30., 40., 50.])
Tensor 2: tensor([1., 2., 3., 4., 5.])
Result: tensor([ 9., 18., 27., 36., 45.])
Each element at index i in tens_2 is subtracted from the element at the same index in tens_1: 10-1=9, 20-2=18, 30-3=27, and so on.
You can also use the - operator, which produces identical results:
result = tens_1 - tens_2
print("Result:", result)
Output:
Result: tensor([ 9., 18., 27., 36., 45.])
Subtracting Two 2D Tensors
Element-wise subtraction works the same way with multi-dimensional tensors - each element at position [i][j] in the second tensor is subtracted from the element at the same position in the first:
import torch
tens_1 = torch.Tensor([[10, 20],
[30, 40]])
tens_2 = torch.Tensor([[1, 2],
[3, 4]])
result = torch.sub(tens_1, tens_2)
print("Tensor 1:")
print(tens_1)
print("\nTensor 2:")
print(tens_2)
print("\nResult:")
print(result)
Output:
Tensor 1:
tensor([[10., 20.],
[30., 40.]])
Tensor 2:
tensor([[1., 2.],
[3., 4.]])
Result:
tensor([[ 9., 18.],
[27., 36.]])
Subtracting a Scalar from a Tensor
You can subtract a single number from every element of a tensor:
import torch
tens = torch.Tensor([100, 200, 300, 400, 500])
result = torch.sub(tens, 50)
print("Original:", tens)
print("- 50: ", result)
Output:
Original: tensor([100., 200., 300., 400., 500.])
- 50: tensor([ 50., 150., 250., 350., 450.])
This also works with multi-dimensional tensors:
import torch
tens = torch.Tensor([[10, 20], [30, 40]])
result = torch.sub(tens, 5)
print("Original:")
print(tens)
print("\n- 5:")
print(result)
Output:
Original:
tensor([[10., 20.],
[30., 40.]])
- 5:
tensor([[ 5., 15.],
[25., 35.]])
Using the alpha Parameter for Scaled Subtraction
The alpha parameter multiplies the second tensor before subtracting it. This computes input - alpha × other, which is particularly useful for gradient descent updates and weighted differences:
import torch
tens_1 = torch.Tensor([100, 200, 300])
tens_2 = torch.Tensor([1, 2, 3])
# Computes: tens_1 - 10 * tens_2
result = torch.sub(tens_1, tens_2, alpha=10)
print("Tensor 1:", tens_1)
print("Tensor 2:", tens_2)
print("Result (input - 10 × other):", result)
Output:
Tensor 1: tensor([100., 200., 300.])
Tensor 2: tensor([1., 2., 3.])
Result (input - 10 × other): tensor([ 90., 180., 270.])
alphaThe alpha parameter is commonly used in optimization. For example, a simple gradient descent update can be expressed as:
# weights = weights - learning_rate * gradients
weights = torch.sub(weights, gradients, alpha=learning_rate)
This is more efficient than weights - learning_rate * gradients because it avoids creating an intermediate tensor for the multiplication.
Broadcasting: Subtracting Tensors of Different Shapes
PyTorch supports broadcasting, which allows operations on tensors with different dimensions. The smaller tensor is automatically expanded to match the larger one.
2D Tensor - 1D Tensor
import torch
tens_2d = torch.Tensor([[100, 200],
[300, 400]])
tens_1d = torch.Tensor([10, 20])
result = torch.sub(tens_2d, tens_1d)
print("2D Tensor:")
print(tens_2d)
print("\n1D Tensor:", tens_1d)
print("\nResult:")
print(result)
Output:
2D Tensor:
tensor([[100., 200.],
[300., 400.]])
1D Tensor: tensor([10., 20.])
Result:
tensor([[ 90., 180.],
[290., 380.]])
The 1D tensor [10, 20] is broadcast across each row: first row becomes [100-10, 200-20] and second row becomes [300-10, 400-20].
2D Tensor - Column Vector
import torch
a = torch.tensor([[10.0, 20.0, 30.0],
[40.0, 50.0, 60.0]])
# Column vector with shape (2, 1)
b = torch.tensor([[5.0],
[10.0]])
result = torch.sub(a, b)
print("Tensor a shape:", a.shape)
print("Tensor b shape:", b.shape)
print("\nResult:")
print(result)
Output:
Tensor a shape: torch.Size([2, 3])
Tensor b shape: torch.Size([2, 1])
Result:
tensor([[ 5., 15., 25.],
[30., 40., 50.]])
Broadcasting follows these rules:
- If tensors have different numbers of dimensions, the smaller tensor's shape is padded with 1s on the left.
- Dimensions with size 1 are stretched to match the other tensor.
- If two dimensions differ and neither is 1, the operation fails with an error.
For example, shapes (2, 3) and (3,) are compatible, but (2, 3) and (4,) are not.
Practical Example: Computing Prediction Errors
A common real-world application of element-wise subtraction is computing the error between predicted and actual values:
import torch
# Predicted values from a model
predictions = torch.tensor([2.5, 0.0, 2.1, 7.8])
# Actual ground truth values
targets = torch.tensor([3.0, -0.5, 2.0, 7.5])
# Compute errors
errors = torch.sub(predictions, targets)
# Compute absolute errors
abs_errors = torch.abs(errors)
# Mean Absolute Error
mae = torch.mean(abs_errors)
print("Predictions:", predictions)
print("Targets: ", targets)
print("Errors: ", errors)
print("Abs Errors: ", abs_errors)
print(f"MAE: {mae.item():.4f}")
Output:
Predictions: tensor([2.5000, 0.0000, 2.1000, 7.8000])
Targets: tensor([ 3.0000, -0.5000, 2.0000, 7.5000])
Errors: tensor([-0.5000, 0.5000, 0.1000, 0.3000])
Abs Errors: tensor([0.5000, 0.5000, 0.1000, 0.3000])
MAE: 0.3500
Common Mistake: Shape Mismatch
Element-wise subtraction requires tensors to have compatible shapes. If the shapes can't be broadcast, PyTorch raises an error:
import torch
a = torch.tensor([1.0, 2.0, 3.0]) # Shape: (3,)
b = torch.tensor([10.0, 20.0, 30.0, 40.0]) # Shape: (4,)
# ❌ Shapes (3,) and (4,) are not compatible
try:
result = torch.sub(a, b)
except RuntimeError as e:
print(f"Error: {e}")
Output:
Error: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
Fix: Ensure your tensors have matching or broadcastable shapes before subtracting:
# ✅ Slice to matching size
result = torch.sub(a, b[:3])
print("Result:", result)
Output:
Result: tensor([ -9., -18., -27.])
In-Place Subtraction with sub_()
To modify a tensor in place without allocating new memory, use sub_():
import torch
tens = torch.tensor([100.0, 200.0, 300.0])
print("Before:", tens)
tens.sub_(25)
print("After: ", tens)
Output:
Before: tensor([100., 200., 300.])
After: tensor([ 75., 175., 275.])
Avoid in-place operations on tensors that require gradient computation. They can break PyTorch's autograd computation graph:
x = torch.tensor([10.0, 20.0], requires_grad=True)
x.sub_(5) # RuntimeError: a leaf Variable that requires grad is being used in an in-place operation
Use the standard torch.sub() or - operator instead when working with gradient-tracked tensors.
torch.sub() vs - Operator
| Feature | torch.sub() | - Operator |
|---|---|---|
| Basic subtraction | ✅ | ✅ |
alpha parameter (scaled subtraction) | ✅ | ❌ |
out parameter (pre-allocated output) | ✅ | ❌ |
| Readability | Explicit | Concise |
Use torch.sub() when you need the alpha or out parameters. Use - for simple, readable subtraction.
Summary
Element-wise subtraction in PyTorch is performed with torch.sub() or the - operator. Both subtract corresponding elements and support broadcasting for tensors with different shapes. Key points to remember:
- Use
torch.sub()for explicit subtraction with optionalalphascaling. - Use the
-operator for concise, readable code. - The
alphaparameter enables efficient scaled subtraction (input - alpha × other), useful for gradient updates. - Broadcasting automatically expands smaller tensors to match larger ones when shapes are compatible.
- Avoid in-place operations (
sub_()) on tensors that participate in gradient computation. - Always verify that tensor shapes are compatible to prevent runtime errors.