Python PyTorch: How to Perform In-Place Operations
In-place operations in PyTorch allow you to modify a tensor's values directly without creating a new tensor. This can reduce memory usage when working with large tensors, which is especially important in deep learning where models and data can consume significant GPU memory.
In this guide, you'll learn how in-place operations work in PyTorch, see examples across common arithmetic operations, understand the important trade-offs, and learn when you should avoid them.
What Are In-Place Operations?
In PyTorch, every in-place operation is identified by a trailing underscore (_) in the method name. The key difference between normal and in-place operations is:
- Normal operation: Creates and returns a new tensor, leaving the original unchanged.
- In-place operation: Modifies the original tensor directly and returns it.
import torch
a = torch.tensor([1, 2, 3])
# Normal operation: 'a' is unchanged
b = a.add(10)
print("a after normal add:", a) # tensor([1, 2, 3])
print("b (new tensor): ", b) # tensor([11, 12, 13])
# In-place operation: 'a' is modified
a.add_(10)
print("a after in-place add:", a) # tensor([11, 12, 13])
Output:
a after normal add: tensor([1, 2, 3])
b (new tensor): tensor([11, 12, 13])
a after in-place add: tensor([11, 12, 13])
Common In-Place Operations
Here's a quick reference of normal operations and their in-place counterparts:
| Operation | Normal | In-Place |
|---|---|---|
| Addition | a.add(b) | a.add_(b) |
| Subtraction | a.sub(b) | a.sub_(b) |
| Multiplication | a.mul(b) | a.mul_(b) |
| Division | a.div(b) | a.div_(b) |
| Fill with value | - | a.fill_(value) |
| Zero out | - | a.zero_() |
| Clamp | a.clamp(min, max) | a.clamp_(min, max) |
| Absolute value | a.abs() | a.abs_() |
| Negation | a.neg() | a.neg_() |
In-Place Addition
import torch
a = torch.tensor([10, 20, 30, 40])
b = torch.tensor([1, 2, 3, 4])
# Normal addition: 'a' stays the same
result = a.add(b)
print("After normal add, a:", a)
print("Result: ", result)
# In-place addition: 'a' is modified
a.add_(b)
print("After in-place add, a:", a)
Output:
After normal add, a: tensor([10, 20, 30, 40])
Result: tensor([11, 22, 33, 44])
After in-place add, a: tensor([11, 22, 33, 44])
In-Place Subtraction
import torch
a = torch.tensor([100, 200, 300])
b = torch.tensor([10, 20, 30])
# Normal subtraction: 'a' stays the same
result = a.sub(b)
print("After normal sub, a:", a)
print("Result: ", result)
# In-place subtraction: 'a' is modified
a.sub_(b)
print("After in-place sub, a:", a)
Output:
After normal sub, a: tensor([100, 200, 300])
Result: tensor([ 90, 180, 270])
After in-place sub, a: tensor([ 90, 180, 270])
In-Place Multiplication
import torch
a = torch.tensor([2, 3, 4, 5])
b = torch.tensor([10, 20, 30, 40])
# Normal multiplication: 'a' stays the same
result = a.mul(b)
print("After normal mul, a:", a)
print("Result: ", result)
# In-place multiplication: 'a' is modified
a.mul_(b)
print("After in-place mul, a:", a)
Output:
After normal mul, a: tensor([2, 3, 4, 5])
Result: tensor([ 20, 60, 120, 200])
After in-place mul, a: tensor([ 20, 60, 120, 200])
In-Place Division
import torch
a = torch.tensor([100.0, 200.0, 300.0])
b = torch.tensor([4.0, 5.0, 6.0])
# Normal division: 'a' stays the same
result = a.div(b)
print("After normal div, a:", a)
print("Result: ", result)
# In-place division: 'a' is modified
a.div_(b)
print("After in-place div, a:", a)
Output:
After normal div, a: tensor([100., 200., 300.])
Result: tensor([25., 40., 50.])
After in-place div, a: tensor([25., 40., 50.])
Chaining Multiple In-Place Operations
In-place operations return the modified tensor, so you can chain them. However, be mindful that each step modifies the original tensor:
import torch
a = torch.tensor([1.0, 2.0, 3.0, 4.0])
print("Original:", a)
# Chain: add 10, then multiply by 2, then subtract 5
a.add_(10).mul_(2).sub_(5)
print("After chaining:", a)
Output:
Original: tensor([1., 2., 3., 4.])
After chaining: tensor([17., 19., 21., 23.])
The computation follows: (1+10)×2-5 = 17, (2+10)×2-5 = 19, etc.
Other Useful In-Place Operations
fill_() - Set All Elements to a Value
import torch
a = torch.tensor([1, 2, 3, 4, 5])
a.fill_(0)
print("After fill_(0):", a)
Output:
After fill_(0): tensor([0, 0, 0, 0, 0])
zero_() - Set All Elements to Zero
import torch
a = torch.tensor([10.0, 20.0, 30.0])
a.zero_()
print("After zero_():", a)
Output:
After zero_(): tensor([0., 0., 0.])
clamp_() - Restrict Values to a Range
import torch
a = torch.tensor([-5.0, -1.0, 0.0, 3.0, 10.0])
a.clamp_(min=0, max=5)
print("After clamp_(0, 5):", a)
Output:
After clamp_(0, 5): tensor([0., 0., 0., 3., 5.])
abs_() and neg_() - Absolute Value and Negation
import torch
a = torch.tensor([-3.0, -1.0, 2.0, 5.0])
a.abs_()
print("After abs_():", a)
a.neg_()
print("After neg_():", a)
Output:
After abs_(): tensor([3., 1., 2., 5.])
After neg_(): tensor([-3., -1., -2., -5.])
When to Avoid In-Place Operations
While in-place operations save memory, they come with important limitations that you must understand.
Problem 1: Breaks Autograd Computation Graphs
In-place operations on leaf tensors that require gradients will raise an error because they corrupt the computation graph needed for backpropagation:
import torch
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
# ❌ This raises a RuntimeError
try:
x.add_(10)
except RuntimeError as e:
print(f"Error: {e}")
Output:
Error: a leaf Variable that requires grad is being used in an in-place operation.
Fix: Use a normal (out-of-place) operation instead:
import torch
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
# ✅ Normal operation: safe with autograd
y = x.add(10)
print("y:", y)
Output:
y: tensor([11., 12., 13.], grad_fn=<AddBackward0>)
Never use in-place operations on tensors involved in gradient computation. This includes model parameters and any intermediate tensors in the forward pass. In-place modifications can silently produce incorrect gradients or cause runtime errors during loss.backward().
Problem 2: Shared References See Changes
If multiple variables reference the same tensor, an in-place operation on one will affect all of them unexpectedly:
import torch
a = torch.tensor([1, 2, 3])
b = a # b is NOT a copy - it references the same data
a.add_(10)
print("a:", a)
print("b:", b) # b is also modified!
Output:
a: tensor([11, 12, 13])
b: tensor([11, 12, 13])
Fix: Clone the tensor if you need independent copies:
import torch
a = torch.tensor([1, 2, 3])
b = a.clone() # ✅ Independent copy
a.add_(10)
print("a:", a) # tensor([11, 12, 13])
print("b:", b) # tensor([1, 2, 3]) (unchanged)
Output:
a: tensor([11, 12, 13])
b: tensor([1, 2, 3])
When to Use In-Place Operations
In-place operations are beneficial in specific scenarios:
| Scenario | Why It Helps |
|---|---|
Zeroing gradients (optimizer.zero_grad()) | Avoids allocating new gradient tensors each step |
Initializing weights (param.data.fill_()) | Directly sets parameter values without creating temporaries |
| Post-processing outputs (clamping, normalizing) | Saves memory when original values aren't needed |
| Working with very large tensors | Reduces peak memory usage by avoiding copies |
As a general rule, prefer normal operations over in-place operations unless you have a specific reason to save memory. The memory savings are rarely significant enough to justify the risks of breaking autograd or introducing subtle bugs from shared references.
Summary
In-place operations in PyTorch are identified by a trailing underscore (add_(), sub_(), mul_(), etc.) and modify tensors directly without allocating new memory. Key points to remember:
- Normal operations create new tensors; in-place operations modify the original.
- In-place operations can reduce memory usage, which matters for large-scale models.
- Never use in-place operations on tensors that require gradients - this breaks autograd.
- Be aware of shared references: modifying a tensor in place affects all variables pointing to it.
- Use
clone()to create independent copies when needed. - Prefer normal operations by default; use in-place operations only when memory savings are critical.