Python PyTorch: How to Perform In-Place Operations

In-place operations in PyTorch allow you to modify a tensor's values directly without creating a new tensor. This can reduce memory usage when working with large tensors, which is especially important in deep learning where models and data can consume significant GPU memory.

In this guide, you'll learn how in-place operations work in PyTorch, see examples across common arithmetic operations, understand the important trade-offs, and learn when you should avoid them.

What Are In-Place Operations?

In PyTorch, every in-place operation is identified by a trailing underscore (_) in the method name. The key difference between normal and in-place operations is:

Normal operation: Creates and returns a new tensor, leaving the original unchanged.
In-place operation: Modifies the original tensor directly and returns it.

import torch

a = torch.tensor([1, 2, 3])

# Normal operation: 'a' is unchanged
b = a.add(10)
print("a after normal add:", a)    # tensor([1, 2, 3])
print("b (new tensor):    ", b)    # tensor([11, 12, 13])

# In-place operation: 'a' is modified
a.add_(10)
print("a after in-place add:", a)  # tensor([11, 12, 13])

Output:

a after normal add: tensor([1, 2, 3])
b (new tensor):     tensor([11, 12, 13])
a after in-place add: tensor([11, 12, 13])

Common In-Place Operations

Here's a quick reference of normal operations and their in-place counterparts:

Operation	Normal	In-Place
Addition	`a.add(b)`	`a.add_(b)`
Subtraction	`a.sub(b)`	`a.sub_(b)`
Multiplication	`a.mul(b)`	`a.mul_(b)`
Division	`a.div(b)`	`a.div_(b)`
Fill with value	-	`a.fill_(value)`
Zero out	-	`a.zero_()`
Clamp	`a.clamp(min, max)`	`a.clamp_(min, max)`
Absolute value	`a.abs()`	`a.abs_()`
Negation	`a.neg()`	`a.neg_()`

In-Place Addition

import torch

a = torch.tensor([10, 20, 30, 40])
b = torch.tensor([1, 2, 3, 4])

# Normal addition: 'a' stays the same
result = a.add(b)
print("After normal add, a:", a)
print("Result:             ", result)

# In-place addition: 'a' is modified
a.add_(b)
print("After in-place add, a:", a)

Output:

After normal add, a: tensor([10, 20, 30, 40])
Result:              tensor([11, 22, 33, 44])
After in-place add, a: tensor([11, 22, 33, 44])

In-Place Subtraction

import torch

a = torch.tensor([100, 200, 300])
b = torch.tensor([10, 20, 30])

# Normal subtraction: 'a' stays the same
result = a.sub(b)
print("After normal sub, a:", a)
print("Result:             ", result)

# In-place subtraction: 'a' is modified
a.sub_(b)
print("After in-place sub, a:", a)

Output:

After normal sub, a: tensor([100, 200, 300])
Result:              tensor([ 90, 180, 270])
After in-place sub, a: tensor([ 90, 180, 270])

In-Place Multiplication

import torch

a = torch.tensor([2, 3, 4, 5])
b = torch.tensor([10, 20, 30, 40])

# Normal multiplication: 'a' stays the same
result = a.mul(b)
print("After normal mul, a:", a)
print("Result:             ", result)

# In-place multiplication: 'a' is modified
a.mul_(b)
print("After in-place mul, a:", a)

Output:

After normal mul, a: tensor([2, 3, 4, 5])
Result:              tensor([ 20,  60, 120, 200])
After in-place mul, a: tensor([ 20,  60, 120, 200])

In-Place Division

import torch

a = torch.tensor([100.0, 200.0, 300.0])
b = torch.tensor([4.0, 5.0, 6.0])

# Normal division: 'a' stays the same
result = a.div(b)
print("After normal div, a:", a)
print("Result:             ", result)

# In-place division: 'a' is modified
a.div_(b)
print("After in-place div, a:", a)

Output:

After normal div, a: tensor([100., 200., 300.])
Result:              tensor([25., 40., 50.])
After in-place div, a: tensor([25., 40., 50.])

Chaining Multiple In-Place Operations

In-place operations return the modified tensor, so you can chain them. However, be mindful that each step modifies the original tensor:

import torch

a = torch.tensor([1.0, 2.0, 3.0, 4.0])
print("Original:", a)

# Chain: add 10, then multiply by 2, then subtract 5
a.add_(10).mul_(2).sub_(5)
print("After chaining:", a)

Output:

Original: tensor([1., 2., 3., 4.])
After chaining: tensor([17., 19., 21., 23.])

The computation follows: (1+10)×2-5 = 17, (2+10)×2-5 = 19, etc.

Other Useful In-Place Operations

`fill_()` - Set All Elements to a Value

import torch

a = torch.tensor([1, 2, 3, 4, 5])
a.fill_(0)
print("After fill_(0):", a)

Output:

After fill_(0): tensor([0, 0, 0, 0, 0])

`zero_()` - Set All Elements to Zero

import torch

a = torch.tensor([10.0, 20.0, 30.0])
a.zero_()
print("After zero_():", a)

Output:

After zero_(): tensor([0., 0., 0.])

`clamp_()` - Restrict Values to a Range

import torch

a = torch.tensor([-5.0, -1.0, 0.0, 3.0, 10.0])
a.clamp_(min=0, max=5)
print("After clamp_(0, 5):", a)

Output:

After clamp_(0, 5): tensor([0., 0., 0., 3., 5.])

`abs_()` and `neg_()` - Absolute Value and Negation

import torch

a = torch.tensor([-3.0, -1.0, 2.0, 5.0])

a.abs_()
print("After abs_():", a)

a.neg_()
print("After neg_():", a)

Output:

After abs_(): tensor([3., 1., 2., 5.])
After neg_(): tensor([-3., -1., -2., -5.])

When to Avoid In-Place Operations

While in-place operations save memory, they come with important limitations that you must understand.

Problem 1: Breaks Autograd Computation Graphs

In-place operations on leaf tensors that require gradients will raise an error because they corrupt the computation graph needed for backpropagation:

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

# ❌ This raises a RuntimeError
try:
    x.add_(10)
except RuntimeError as e:
    print(f"Error: {e}")

Output:

Error: a leaf Variable that requires grad is being used in an in-place operation.

Fix: Use a normal (out-of-place) operation instead:

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

# ✅ Normal operation: safe with autograd
y = x.add(10)
print("y:", y)

Output:

y: tensor([11., 12., 13.], grad_fn=<AddBackward0>)

warning

Never use in-place operations on tensors involved in gradient computation. This includes model parameters and any intermediate tensors in the forward pass. In-place modifications can silently produce incorrect gradients or cause runtime errors during loss.backward().

Problem 2: Shared References See Changes

If multiple variables reference the same tensor, an in-place operation on one will affect all of them unexpectedly:

import torch

a = torch.tensor([1, 2, 3])
b = a  # b is NOT a copy - it references the same data

a.add_(10)

print("a:", a)
print("b:", b)  # b is also modified!

Output:

a: tensor([11, 12, 13])
b: tensor([11, 12, 13])

Fix: Clone the tensor if you need independent copies:

import torch

a = torch.tensor([1, 2, 3])
b = a.clone()   # ✅ Independent copy

a.add_(10)

print("a:", a)  # tensor([11, 12, 13])
print("b:", b)  # tensor([1, 2, 3]) (unchanged)

Output:

a: tensor([11, 12, 13])
b: tensor([1, 2, 3])

When to Use In-Place Operations

In-place operations are beneficial in specific scenarios:

Scenario	Why It Helps
Zeroing gradients (`optimizer.zero_grad()`)	Avoids allocating new gradient tensors each step
Initializing weights (`param.data.fill_()`)	Directly sets parameter values without creating temporaries
Post-processing outputs (clamping, normalizing)	Saves memory when original values aren't needed
Working with very large tensors	Reduces peak memory usage by avoiding copies

Best Practice

As a general rule, prefer normal operations over in-place operations unless you have a specific reason to save memory. The memory savings are rarely significant enough to justify the risks of breaking autograd or introducing subtle bugs from shared references.

Summary

In-place operations in PyTorch are identified by a trailing underscore (add_(), sub_(), mul_(), etc.) and modify tensors directly without allocating new memory. Key points to remember:

Normal operations create new tensors; in-place operations modify the original.
In-place operations can reduce memory usage, which matters for large-scale models.
Never use in-place operations on tensors that require gradients - this breaks autograd.
Be aware of shared references: modifying a tensor in place affects all variables pointing to it.
Use clone() to create independent copies when needed.
Prefer normal operations by default; use in-place operations only when memory savings are critical.

What Are In-Place Operations?​

Common In-Place Operations​

In-Place Addition​

In-Place Subtraction​

In-Place Multiplication​

In-Place Division​

Chaining Multiple In-Place Operations​

Other Useful In-Place Operations​

fill_() - Set All Elements to a Value​

zero_() - Set All Elements to Zero​

clamp_() - Restrict Values to a Range​

abs_() and neg_() - Absolute Value and Negation​

When to Avoid In-Place Operations​

Problem 1: Breaks Autograd Computation Graphs​

Problem 2: Shared References See Changes​

When to Use In-Place Operations​

Summary​

Table of Contents

What Are In-Place Operations?

Common In-Place Operations

In-Place Addition

In-Place Subtraction

In-Place Multiplication

In-Place Division

Chaining Multiple In-Place Operations

Other Useful In-Place Operations

`fill_()` - Set All Elements to a Value

`zero_()` - Set All Elements to Zero

`clamp_()` - Restrict Values to a Range

`abs_()` and `neg_()` - Absolute Value and Negation

When to Avoid In-Place Operations

Problem 1: Breaks Autograd Computation Graphs

Problem 2: Shared References See Changes

When to Use In-Place Operations

Summary