Python PyTorch: How to Perform Element-Wise Multiplication on Tensors
Element-wise multiplication (also known as the Hadamard product) is a fundamental operation in deep learning. It multiplies corresponding elements of two tensors, producing a new tensor of the same shape. This operation is widely used in attention mechanisms, feature scaling, masking operations, and gating functions in neural networks like LSTMs and GRUs.
In this guide, you'll learn how to perform element-wise multiplication using torch.mul() and the * operator, handle tensors of different dimensions through broadcasting, and understand the difference between element-wise and matrix multiplication.
Understanding torch.mul()
PyTorch provides the torch.mul() function for element-wise multiplication:
torch.mul(input, other, *, out=None)
Parameters:
| Parameter | Description |
|---|---|
input | The first input tensor |
other | The second tensor or a scalar value to multiply with |
out | Optional pre-allocated output tensor |
Returns: A new tensor containing the element-wise product.
Multiplying Two 1D Tensors
The simplest case multiplies corresponding elements of two tensors with the same shape:
import torch
tens_1 = torch.Tensor([1, 2, 3, 4, 5])
tens_2 = torch.Tensor([10, 20, 30, 40, 50])
result = torch.mul(tens_1, tens_2)
print("Tensor 1:", tens_1)
print("Tensor 2:", tens_2)
print("Result: ", result)
Output:
Tensor 1: tensor([1., 2., 3., 4., 5.])
Tensor 2: tensor([10., 20., 30., 40., 50.])
Result: tensor([ 10., 40., 90., 160., 250.])
Each element at index i in tens_1 is multiplied by the element at the same index in tens_2: 1×10=10, 2×20=40, 3×30=90, and so on.
You can also use the * operator, which produces identical results:
result = tens_1 * tens_2
print("Result:", result)
Output:
Result: tensor([ 10., 40., 90., 160., 250.])
Multiplying Two 2D Tensors
Element-wise multiplication works the same way with multi-dimensional tensors - each element at position [i][j] is multiplied by its counterpart:
import torch
tens_1 = torch.Tensor([[10, 20],
[30, 40]])
tens_2 = torch.Tensor([[1, 2],
[3, 4]])
result = torch.mul(tens_1, tens_2)
print("Tensor 1:")
print(tens_1)
print("\nTensor 2:")
print(tens_2)
print("\nResult:")
print(result)
Output:
Tensor 1:
tensor([[10., 20.],
[30., 40.]])
Tensor 2:
tensor([[1., 2.],
[3., 4.]])
Result:
tensor([[ 10., 40.],
[ 90., 160.]])
Multiplying a Tensor by a Scalar
You can multiply every element of a tensor by a single number:
import torch
tens = torch.Tensor([100, 200, 300, 400, 500])
result = torch.mul(tens, 2)
print("Original:", tens)
print("× 2: ", result)
Output:
Original: tensor([100., 200., 300., 400., 500.])
× 2: tensor([ 200., 400., 600., 800., 1000.])
This also works with 2D tensors:
import torch
tens = torch.Tensor([[1, 2], [3, 4]])
result = torch.mul(tens, 0.5)
print("Original:")
print(tens)
print("\n× 0.5:")
print(result)
Output:
Original:
tensor([[1., 2.],
[3., 4.]])
× 0.5:
tensor([[0.5000, 1.0000],
[1.5000, 2.0000]])
Broadcasting: Multiplying Tensors of Different Shapes
PyTorch supports broadcasting, which allows element-wise operations on tensors with different dimensions. The smaller tensor is automatically expanded to match the larger one.
2D Tensor × 1D Tensor
import torch
tens_2d = torch.Tensor([[10, 20],
[30, 40]])
tens_1d = torch.Tensor([2, 4])
result = torch.mul(tens_2d, tens_1d)
print("2D Tensor:")
print(tens_2d)
print("\n1D Tensor:", tens_1d)
print("\nResult:")
print(result)
Output:
2D Tensor:
tensor([[10., 20.],
[30., 40.]])
1D Tensor: tensor([2., 4.])
Result:
tensor([[ 20., 80.],
[ 60., 160.]])
The 1D tensor [2, 4] is broadcast across each row: first row becomes [10×2, 20×4] and second row becomes [30×2, 40×4].
3D Tensor × 1D Tensor
import torch
tens_3d = torch.tensor([[[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]],
[[7.0, 8.0, 9.0],
[10.0, 11.0, 12.0]]])
tens_1d = torch.tensor([10.0, 100.0, 1000.0])
result = torch.mul(tens_3d, tens_1d)
print("3D Tensor shape:", tens_3d.shape)
print("1D Tensor shape:", tens_1d.shape)
print("Result shape: ", result.shape)
print("\nResult:")
print(result)
Output:
3D Tensor shape: torch.Size([2, 2, 3])
1D Tensor shape: torch.Size([3])
Result shape: torch.Size([2, 2, 3])
Result:
tensor([[[1.0000e+01, 2.0000e+02, 3.0000e+03],
[4.0000e+01, 5.0000e+02, 6.0000e+03]],
[[7.0000e+01, 8.0000e+02, 9.0000e+03],
[1.0000e+02, 1.1000e+03, 1.2000e+04]]])
Broadcasting follows these rules:
- If tensors have different numbers of dimensions, the smaller tensor's shape is padded with 1s on the left.
- Dimensions with size 1 are stretched to match the other tensor.
- If two dimensions differ and neither is 1, the operation fails with an error.
For example, shapes (2, 3) and (3,) are compatible, but (2, 3) and (4,) are not.
Practical Example: Applying a Mask to a Tensor
A common real-world use of element-wise multiplication is masking - zeroing out specific elements of a tensor. This is frequently used in natural language processing to handle variable-length sequences:
import torch
# Simulated feature tensor (batch_size=2, seq_len=5, features=3)
features = torch.tensor([[[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]],
[[1.0, 1.0, 1.0],
[2.0, 2.0, 2.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]])
# Mask: 1 for valid positions, 0 for padding
mask = torch.tensor([[[1], [1], [1], [0], [0]],
[[1], [1], [0], [0], [0]]], dtype=torch.float)
# Apply mask via element-wise multiplication
masked_features = torch.mul(features, mask)
print("Masked features:")
print(masked_features)
Output:
Masked features:
tensor([[[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.],
[0., 0., 0.],
[0., 0., 0.]],
[[1., 1., 1.],
[2., 2., 2.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]])
Common Mistake: Confusing Element-Wise and Matrix Multiplication
A frequent source of bugs is using torch.mul() (or *) when you actually need matrix multiplication (torch.matmul() or @), or vice versa:
import torch
a = torch.tensor([[1.0, 2.0],
[3.0, 4.0]])
b = torch.tensor([[5.0, 6.0],
[7.0, 8.0]])
# Element-wise multiplication (Hadamard product)
elem_result = torch.mul(a, b)
# Matrix multiplication
mat_result = torch.matmul(a, b)
print("Element-wise (torch.mul):")
print(elem_result)
print("\nMatrix multiplication (torch.matmul):")
print(mat_result)
Output:
Element-wise (torch.mul):
tensor([[ 5., 12.],
[21., 32.]])
Matrix multiplication (torch.matmul):
tensor([[19., 22.],
[43., 50.]])
These two operations produce completely different results:
torch.mul(a, b)multiplies eacha[i][j]byb[i][j]independently.torch.matmul(a, b)performs standard matrix multiplication (dot products of rows and columns).
Make sure you're using the correct operation for your use case. Element-wise multiplication is for scaling and masking; matrix multiplication is for linear transformations and layer computations.
Common Mistake: Shape Mismatch
Element-wise multiplication requires tensors to have compatible shapes. If they can't be broadcast, PyTorch raises an error:
import torch
a = torch.tensor([1.0, 2.0, 3.0]) # Shape: (3,)
b = torch.tensor([10.0, 20.0, 30.0, 40.0]) # Shape: (4,)
# ❌ Shapes (3,) and (4,) are not compatible
try:
result = torch.mul(a, b)
except RuntimeError as e:
print(f"Error: {e}")
Output:
Error: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
Fix: Ensure your tensors have matching or broadcastable shapes before multiplying:
# ✅ Slice to matching size
result = torch.mul(a, b[:3])
print("Result:", result)
Output:
Result: tensor([10., 40., 90.])
In-Place Multiplication with mul_()
To modify a tensor in place without allocating new memory, use mul_():
import torch
tens = torch.tensor([1.0, 2.0, 3.0, 4.0])
print("Before:", tens)
tens.mul_(10)
print("After: ", tens)
Output:
Before: tensor([1., 2., 3., 4.])
After: tensor([10., 20., 30., 40.])
Avoid in-place operations on tensors that require gradients. They can break PyTorch's autograd computation graph:
x = torch.tensor([1.0, 2.0], requires_grad=True)
x.mul_(5) # RuntimeError: a leaf Variable that requires grad is being used in an in-place operation
Use the standard torch.mul() or * operator instead when working with gradient-tracked tensors.
torch.mul() vs * Operator
| Feature | torch.mul() | * Operator |
|---|---|---|
| Basic element-wise multiplication | ✅ | ✅ |
out parameter (pre-allocated output) | ✅ | ❌ |
| Readability | Explicit | Concise |
| Performance | Same | Same |
Both produce identical results. Use * for concise, readable code and torch.mul() when you need the out parameter or want to be explicit about the operation.
Summary
Element-wise multiplication in PyTorch is performed with torch.mul() or the * operator. Both multiply corresponding elements of two tensors and support broadcasting for tensors with different shapes. Key points to remember:
- Use
torch.mul()or*for element-wise (Hadamard) multiplication. - Use
torch.matmul()or@for matrix multiplication - don't confuse the two. - Broadcasting automatically expands smaller tensors to match larger ones when shapes are compatible.
- Avoid in-place operations (
mul_()) on tensors that participate in gradient computation. - Always verify that tensor shapes are compatible to prevent runtime errors.