Skip to main content

Python PyTorch: How to Perform In-Place Operations

In-place operations in PyTorch allow you to modify a tensor's values directly without creating a new tensor. This can reduce memory usage when working with large tensors, which is especially important in deep learning where models and data can consume significant GPU memory.

In this guide, you'll learn how in-place operations work in PyTorch, see examples across common arithmetic operations, understand the important trade-offs, and learn when you should avoid them.

What Are In-Place Operations?

In PyTorch, every in-place operation is identified by a trailing underscore (_) in the method name. The key difference between normal and in-place operations is:

  • Normal operation: Creates and returns a new tensor, leaving the original unchanged.
  • In-place operation: Modifies the original tensor directly and returns it.
import torch

a = torch.tensor([1, 2, 3])

# Normal operation: 'a' is unchanged
b = a.add(10)
print("a after normal add:", a) # tensor([1, 2, 3])
print("b (new tensor): ", b) # tensor([11, 12, 13])

# In-place operation: 'a' is modified
a.add_(10)
print("a after in-place add:", a) # tensor([11, 12, 13])

Output:

a after normal add: tensor([1, 2, 3])
b (new tensor): tensor([11, 12, 13])
a after in-place add: tensor([11, 12, 13])

Common In-Place Operations

Here's a quick reference of normal operations and their in-place counterparts:

OperationNormalIn-Place
Additiona.add(b)a.add_(b)
Subtractiona.sub(b)a.sub_(b)
Multiplicationa.mul(b)a.mul_(b)
Divisiona.div(b)a.div_(b)
Fill with value-a.fill_(value)
Zero out-a.zero_()
Clampa.clamp(min, max)a.clamp_(min, max)
Absolute valuea.abs()a.abs_()
Negationa.neg()a.neg_()

In-Place Addition

import torch

a = torch.tensor([10, 20, 30, 40])
b = torch.tensor([1, 2, 3, 4])

# Normal addition: 'a' stays the same
result = a.add(b)
print("After normal add, a:", a)
print("Result: ", result)

# In-place addition: 'a' is modified
a.add_(b)
print("After in-place add, a:", a)

Output:

After normal add, a: tensor([10, 20, 30, 40])
Result: tensor([11, 22, 33, 44])
After in-place add, a: tensor([11, 22, 33, 44])

In-Place Subtraction

import torch

a = torch.tensor([100, 200, 300])
b = torch.tensor([10, 20, 30])

# Normal subtraction: 'a' stays the same
result = a.sub(b)
print("After normal sub, a:", a)
print("Result: ", result)

# In-place subtraction: 'a' is modified
a.sub_(b)
print("After in-place sub, a:", a)

Output:

After normal sub, a: tensor([100, 200, 300])
Result: tensor([ 90, 180, 270])
After in-place sub, a: tensor([ 90, 180, 270])

In-Place Multiplication

import torch

a = torch.tensor([2, 3, 4, 5])
b = torch.tensor([10, 20, 30, 40])

# Normal multiplication: 'a' stays the same
result = a.mul(b)
print("After normal mul, a:", a)
print("Result: ", result)

# In-place multiplication: 'a' is modified
a.mul_(b)
print("After in-place mul, a:", a)

Output:

After normal mul, a: tensor([2, 3, 4, 5])
Result: tensor([ 20, 60, 120, 200])
After in-place mul, a: tensor([ 20, 60, 120, 200])

In-Place Division

import torch

a = torch.tensor([100.0, 200.0, 300.0])
b = torch.tensor([4.0, 5.0, 6.0])

# Normal division: 'a' stays the same
result = a.div(b)
print("After normal div, a:", a)
print("Result: ", result)

# In-place division: 'a' is modified
a.div_(b)
print("After in-place div, a:", a)

Output:

After normal div, a: tensor([100., 200., 300.])
Result: tensor([25., 40., 50.])
After in-place div, a: tensor([25., 40., 50.])

Chaining Multiple In-Place Operations

In-place operations return the modified tensor, so you can chain them. However, be mindful that each step modifies the original tensor:

import torch

a = torch.tensor([1.0, 2.0, 3.0, 4.0])
print("Original:", a)

# Chain: add 10, then multiply by 2, then subtract 5
a.add_(10).mul_(2).sub_(5)
print("After chaining:", a)

Output:

Original: tensor([1., 2., 3., 4.])
After chaining: tensor([17., 19., 21., 23.])

The computation follows: (1+10)×2-5 = 17, (2+10)×2-5 = 19, etc.

Other Useful In-Place Operations

fill_() - Set All Elements to a Value

import torch

a = torch.tensor([1, 2, 3, 4, 5])
a.fill_(0)
print("After fill_(0):", a)

Output:

After fill_(0): tensor([0, 0, 0, 0, 0])

zero_() - Set All Elements to Zero

import torch

a = torch.tensor([10.0, 20.0, 30.0])
a.zero_()
print("After zero_():", a)

Output:

After zero_(): tensor([0., 0., 0.])

clamp_() - Restrict Values to a Range

import torch

a = torch.tensor([-5.0, -1.0, 0.0, 3.0, 10.0])
a.clamp_(min=0, max=5)
print("After clamp_(0, 5):", a)

Output:

After clamp_(0, 5): tensor([0., 0., 0., 3., 5.])

abs_() and neg_() - Absolute Value and Negation

import torch

a = torch.tensor([-3.0, -1.0, 2.0, 5.0])

a.abs_()
print("After abs_():", a)

a.neg_()
print("After neg_():", a)

Output:

After abs_(): tensor([3., 1., 2., 5.])
After neg_(): tensor([-3., -1., -2., -5.])

When to Avoid In-Place Operations

While in-place operations save memory, they come with important limitations that you must understand.

Problem 1: Breaks Autograd Computation Graphs

In-place operations on leaf tensors that require gradients will raise an error because they corrupt the computation graph needed for backpropagation:

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

# ❌ This raises a RuntimeError
try:
x.add_(10)
except RuntimeError as e:
print(f"Error: {e}")

Output:

Error: a leaf Variable that requires grad is being used in an in-place operation.

Fix: Use a normal (out-of-place) operation instead:

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

# ✅ Normal operation: safe with autograd
y = x.add(10)
print("y:", y)

Output:

y: tensor([11., 12., 13.], grad_fn=<AddBackward0>)
caution

Never use in-place operations on tensors involved in gradient computation. This includes model parameters and any intermediate tensors in the forward pass. In-place modifications can silently produce incorrect gradients or cause runtime errors during loss.backward().

Problem 2: Shared References See Changes

If multiple variables reference the same tensor, an in-place operation on one will affect all of them unexpectedly:

import torch

a = torch.tensor([1, 2, 3])
b = a # b is NOT a copy - it references the same data

a.add_(10)

print("a:", a)
print("b:", b) # b is also modified!

Output:

a: tensor([11, 12, 13])
b: tensor([11, 12, 13])

Fix: Clone the tensor if you need independent copies:

import torch

a = torch.tensor([1, 2, 3])
b = a.clone() # ✅ Independent copy

a.add_(10)

print("a:", a) # tensor([11, 12, 13])
print("b:", b) # tensor([1, 2, 3]) (unchanged)

Output:

a: tensor([11, 12, 13])
b: tensor([1, 2, 3])

When to Use In-Place Operations

In-place operations are beneficial in specific scenarios:

ScenarioWhy It Helps
Zeroing gradients (optimizer.zero_grad())Avoids allocating new gradient tensors each step
Initializing weights (param.data.fill_())Directly sets parameter values without creating temporaries
Post-processing outputs (clamping, normalizing)Saves memory when original values aren't needed
Working with very large tensorsReduces peak memory usage by avoiding copies
Best Practice

As a general rule, prefer normal operations over in-place operations unless you have a specific reason to save memory. The memory savings are rarely significant enough to justify the risks of breaking autograd or introducing subtle bugs from shared references.

Summary

In-place operations in PyTorch are identified by a trailing underscore (add_(), sub_(), mul_(), etc.) and modify tensors directly without allocating new memory. Key points to remember:

  • Normal operations create new tensors; in-place operations modify the original.
  • In-place operations can reduce memory usage, which matters for large-scale models.
  • Never use in-place operations on tensors that require gradients - this breaks autograd.
  • Be aware of shared references: modifying a tensor in place affects all variables pointing to it.
  • Use clone() to create independent copies when needed.
  • Prefer normal operations by default; use in-place operations only when memory savings are critical.