Python PyTorch: How to Perform Element-Wise Multiplication on Tensors

Element-wise multiplication (also known as the Hadamard product) is a fundamental operation in deep learning. It multiplies corresponding elements of two tensors, producing a new tensor of the same shape. This operation is widely used in attention mechanisms, feature scaling, masking operations, and gating functions in neural networks like LSTMs and GRUs.

In this guide, you'll learn how to perform element-wise multiplication using torch.mul() and the * operator, handle tensors of different dimensions through broadcasting, and understand the difference between element-wise and matrix multiplication.

Understanding `torch.mul()`

PyTorch provides the torch.mul() function for element-wise multiplication:

torch.mul(input, other, *, out=None)

Parameters:

Parameter	Description
`input`	The first input tensor
`other`	The second tensor or a scalar value to multiply with
`out`	Optional pre-allocated output tensor

Returns: A new tensor containing the element-wise product.

Multiplying Two 1D Tensors

The simplest case multiplies corresponding elements of two tensors with the same shape:

import torch

tens_1 = torch.Tensor([1, 2, 3, 4, 5])
tens_2 = torch.Tensor([10, 20, 30, 40, 50])

result = torch.mul(tens_1, tens_2)

print("Tensor 1:", tens_1)
print("Tensor 2:", tens_2)
print("Result:  ", result)

Output:

Tensor 1: tensor([1., 2., 3., 4., 5.])
Tensor 2: tensor([10., 20., 30., 40., 50.])
Result:   tensor([ 10.,  40.,  90., 160., 250.])

Each element at index i in tens_1 is multiplied by the element at the same index in tens_2: 1×10=10, 2×20=40, 3×30=90, and so on.

You can also use the * operator, which produces identical results:

result = tens_1 * tens_2
print("Result:", result)

Output:

Result: tensor([ 10.,  40.,  90., 160., 250.])

Multiplying Two 2D Tensors

Element-wise multiplication works the same way with multi-dimensional tensors - each element at position [i][j] is multiplied by its counterpart:

import torch

tens_1 = torch.Tensor([[10, 20],
                        [30, 40]])
tens_2 = torch.Tensor([[1, 2],
                        [3, 4]])

result = torch.mul(tens_1, tens_2)

print("Tensor 1:")
print(tens_1)
print("\nTensor 2:")
print(tens_2)
print("\nResult:")
print(result)

Output:

Tensor 1:
tensor([[10., 20.],
        [30., 40.]])

Tensor 2:
tensor([[1., 2.],
        [3., 4.]])

Result:
tensor([[ 10.,  40.],
        [ 90., 160.]])

Multiplying a Tensor by a Scalar

You can multiply every element of a tensor by a single number:

import torch

tens = torch.Tensor([100, 200, 300, 400, 500])

result = torch.mul(tens, 2)

print("Original:", tens)
print("× 2:     ", result)

Output:

Original: tensor([100., 200., 300., 400., 500.])
× 2:      tensor([ 200.,  400.,  600.,  800., 1000.])

This also works with 2D tensors:

import torch

tens = torch.Tensor([[1, 2], [3, 4]])
result = torch.mul(tens, 0.5)

print("Original:")
print(tens)
print("\n× 0.5:")
print(result)

Output:

Original:
tensor([[1., 2.],
        [3., 4.]])

× 0.5:
tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])

Broadcasting: Multiplying Tensors of Different Shapes

PyTorch supports broadcasting, which allows element-wise operations on tensors with different dimensions. The smaller tensor is automatically expanded to match the larger one.

2D Tensor × 1D Tensor

import torch

tens_2d = torch.Tensor([[10, 20],
                         [30, 40]])
tens_1d = torch.Tensor([2, 4])

result = torch.mul(tens_2d, tens_1d)

print("2D Tensor:")
print(tens_2d)
print("\n1D Tensor:", tens_1d)
print("\nResult:")
print(result)

Output:

2D Tensor:
tensor([[10., 20.],
        [30., 40.]])

1D Tensor: tensor([2., 4.])

Result:
tensor([[ 20.,  80.],
        [ 60., 160.]])

The 1D tensor [2, 4] is broadcast across each row: first row becomes [10×2, 20×4] and second row becomes [30×2, 40×4].

3D Tensor × 1D Tensor

import torch

tens_3d = torch.tensor([[[1.0, 2.0, 3.0],
                          [4.0, 5.0, 6.0]],
                         [[7.0, 8.0, 9.0],
                          [10.0, 11.0, 12.0]]])

tens_1d = torch.tensor([10.0, 100.0, 1000.0])

result = torch.mul(tens_3d, tens_1d)

print("3D Tensor shape:", tens_3d.shape)
print("1D Tensor shape:", tens_1d.shape)
print("Result shape:   ", result.shape)
print("\nResult:")
print(result)

Output:

3D Tensor shape: torch.Size([2, 2, 3])
1D Tensor shape: torch.Size([3])
Result shape:    torch.Size([2, 2, 3])

Result:
tensor([[[1.0000e+01, 2.0000e+02, 3.0000e+03],
         [4.0000e+01, 5.0000e+02, 6.0000e+03]],

        [[7.0000e+01, 8.0000e+02, 9.0000e+03],
         [1.0000e+02, 1.1000e+03, 1.2000e+04]]])

How Broadcasting Works

Broadcasting follows these rules:

If tensors have different numbers of dimensions, the smaller tensor's shape is padded with 1s on the left.
Dimensions with size 1 are stretched to match the other tensor.
If two dimensions differ and neither is 1, the operation fails with an error.

For example, shapes (2, 3) and (3,) are compatible, but (2, 3) and (4,) are not.

Practical Example: Applying a Mask to a Tensor

A common real-world use of element-wise multiplication is masking - zeroing out specific elements of a tensor. This is frequently used in natural language processing to handle variable-length sequences:

import torch

# Simulated feature tensor (batch_size=2, seq_len=5, features=3)
features = torch.tensor([[[1.0, 2.0, 3.0],
                           [4.0, 5.0, 6.0],
                           [7.0, 8.0, 9.0],
                           [0.0, 0.0, 0.0],
                           [0.0, 0.0, 0.0]],

                          [[1.0, 1.0, 1.0],
                           [2.0, 2.0, 2.0],
                           [0.0, 0.0, 0.0],
                           [0.0, 0.0, 0.0],
                           [0.0, 0.0, 0.0]]])

# Mask: 1 for valid positions, 0 for padding
mask = torch.tensor([[[1], [1], [1], [0], [0]],
                      [[1], [1], [0], [0], [0]]], dtype=torch.float)

# Apply mask via element-wise multiplication
masked_features = torch.mul(features, mask)

print("Masked features:")
print(masked_features)

Output:

Masked features:
tensor([[[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[1., 1., 1.],
         [2., 2., 2.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]])

Common Mistake: Confusing Element-Wise and Matrix Multiplication

A frequent source of bugs is using torch.mul() (or *) when you actually need matrix multiplication (torch.matmul() or @), or vice versa:

import torch

a = torch.tensor([[1.0, 2.0],
                   [3.0, 4.0]])
b = torch.tensor([[5.0, 6.0],
                   [7.0, 8.0]])

# Element-wise multiplication (Hadamard product)
elem_result = torch.mul(a, b)

# Matrix multiplication
mat_result = torch.matmul(a, b)

print("Element-wise (torch.mul):")
print(elem_result)
print("\nMatrix multiplication (torch.matmul):")
print(mat_result)

Output:

Element-wise (torch.mul):
tensor([[ 5., 12.],
        [21., 32.]])

Matrix multiplication (torch.matmul):
tensor([[19., 22.],
        [43., 50.]])

warning

These two operations produce completely different results:

torch.mul(a, b) multiplies each a[i][j] by b[i][j] independently.
torch.matmul(a, b) performs standard matrix multiplication (dot products of rows and columns).

Make sure you're using the correct operation for your use case. Element-wise multiplication is for scaling and masking; matrix multiplication is for linear transformations and layer computations.

Common Mistake: Shape Mismatch

Element-wise multiplication requires tensors to have compatible shapes. If they can't be broadcast, PyTorch raises an error:

import torch

a = torch.tensor([1.0, 2.0, 3.0])               # Shape: (3,)
b = torch.tensor([10.0, 20.0, 30.0, 40.0])      # Shape: (4,)

# ❌ Shapes (3,) and (4,) are not compatible
try:
    result = torch.mul(a, b)
except RuntimeError as e:
    print(f"Error: {e}")

Output:

Error: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0

Fix: Ensure your tensors have matching or broadcastable shapes before multiplying:

# ✅ Slice to matching size
result = torch.mul(a, b[:3])
print("Result:", result)

Output:

Result: tensor([10., 40., 90.])

In-Place Multiplication with `mul_()`

To modify a tensor in place without allocating new memory, use mul_():

import torch

tens = torch.tensor([1.0, 2.0, 3.0, 4.0])
print("Before:", tens)

tens.mul_(10)
print("After: ", tens)

Output:

Before: tensor([1., 2., 3., 4.])
After:  tensor([10., 20., 30., 40.])

warning

Avoid in-place operations on tensors that require gradients. They can break PyTorch's autograd computation graph:

x = torch.tensor([1.0, 2.0], requires_grad=True)
x.mul_(5)  # RuntimeError: a leaf Variable that requires grad is being used in an in-place operation

Use the standard torch.mul() or * operator instead when working with gradient-tracked tensors.

`torch.mul()` vs `*` Operator

Feature	`torch.mul()`	`*` Operator
Basic element-wise multiplication	✅	✅
`out` parameter (pre-allocated output)	✅	❌
Readability	Explicit	Concise
Performance	Same	Same

Both produce identical results. Use * for concise, readable code and torch.mul() when you need the out parameter or want to be explicit about the operation.

Summary

Element-wise multiplication in PyTorch is performed with torch.mul() or the * operator. Both multiply corresponding elements of two tensors and support broadcasting for tensors with different shapes. Key points to remember:

Use torch.mul() or * for element-wise (Hadamard) multiplication.
Use torch.matmul() or @ for matrix multiplication - don't confuse the two.
Broadcasting automatically expands smaller tensors to match larger ones when shapes are compatible.
Avoid in-place operations (mul_()) on tensors that participate in gradient computation.
Always verify that tensor shapes are compatible to prevent runtime errors.

Understanding torch.mul()​

Multiplying Two 1D Tensors​

Multiplying Two 2D Tensors​

Multiplying a Tensor by a Scalar​

Broadcasting: Multiplying Tensors of Different Shapes​

2D Tensor × 1D Tensor​

3D Tensor × 1D Tensor​

Practical Example: Applying a Mask to a Tensor​

Common Mistake: Confusing Element-Wise and Matrix Multiplication​

Common Mistake: Shape Mismatch​

In-Place Multiplication with mul_()​

torch.mul() vs * Operator​

Summary​

Table of Contents

Understanding `torch.mul()`

Multiplying Two 1D Tensors

Multiplying Two 2D Tensors

Multiplying a Tensor by a Scalar

Broadcasting: Multiplying Tensors of Different Shapes

2D Tensor × 1D Tensor

3D Tensor × 1D Tensor

Practical Example: Applying a Mask to a Tensor

Common Mistake: Confusing Element-Wise and Matrix Multiplication

Common Mistake: Shape Mismatch

In-Place Multiplication with `mul_()`

`torch.mul()` vs `*` Operator

Summary