Python Pandas: How to Apply a Function to Rows or Columns in Pandas
The .apply() method in Pandas lets you run custom functions across every row or column of a DataFrame. It is one of the most flexible tools available for data transformation, enabling everything from simple calculations to complex multi-column logic. However, understanding when .apply() is the right choice and when vectorized operations would be faster is key to writing efficient Pandas code.
In this guide, you will learn how to use .apply() along both axes, pass additional arguments to your functions, and recognize situations where a vectorized alternative would be significantly more performant.
Applying a Function to Each Column (axis=0)
By default, .apply() passes each column as a Series to your function. This is useful for computing summary statistics or transformations that operate on an entire column at once:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [10, 20, 30]
})
# Calculate the range (max - min) of each column
result = df.apply(lambda col: col.max() - col.min())
print(result)
Output:
A 2
B 20
dtype: int64
The function receives column A as a Series, computes 3 - 1 = 2, then receives column B and computes 30 - 10 = 20.
More Column-Wise Examples
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Standard deviation of each column
print(df.apply(np.std))
# Count non-zero values per column
print(df.apply(lambda col: (col != 0).sum()))
# Normalize each column to zero mean and unit variance
normalized = df.apply(lambda col: (col - col.mean()) / col.std())
print(normalized)
Output:
A 0.816497
B 0.816497
dtype: float64
A 3
B 3
dtype: int64
A B
0 -1.0 -1.0
1 0.0 0.0
2 1.0 1.0
Applying a Function to Each Row (axis=1)
Setting axis=1 passes each row as a Series to your function. This is the mode you need when your logic depends on values from multiple columns in the same row:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [10, 20, 30]
})
# Sum values across each row
df['Total'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
print(df)
Output:
A B Total
0 1 10 11
1 2 20 22
2 3 30 33
Row-Wise Examples with Multiple Columns
import pandas as pd
df = pd.DataFrame({
'First': ['John', 'Jane'],
'Last': ['Doe', 'Smith'],
'Age': [25, 30]
})
# Combine columns into a full name
df['Full_Name'] = df.apply(
lambda row: f"{row['First']} {row['Last']}", axis=1
)
# Conditional logic based on row values
df['Category'] = df.apply(
lambda row: 'Senior' if row['Age'] >= 30 else 'Junior', axis=1
)
print(df)
Output:
First Last Age Full_Name Category
0 John Doe 25 John Doe Junior
1 Jane Smith 30 Jane Smith Senior
Passing Extra Arguments to Your Function
When your function needs additional parameters beyond the row or column data, use the args parameter for positional arguments or pass keyword arguments directly:
import pandas as pd
df = pd.DataFrame({'Score': [85, 90, 78]})
# Using positional arguments with args
def add_bonus(x, bonus):
return x + bonus
df['With_Bonus'] = df['Score'].apply(add_bonus, args=(5,))
# Using keyword arguments
def scale_value(x, factor=1, offset=0):
return x * factor + offset
df['Scaled'] = df['Score'].apply(scale_value, factor=1.1, offset=2)
print(df)
Output:
Score With_Bonus Scaled
0 85 90 95.5
1 90 95 101.0
2 78 83 87.8
When to Prefer Vectorized Operations Over .apply()
While .apply() is flexible, it processes rows or columns one at a time through Python, which is much slower than NumPy's vectorized operations that run in optimized C code. For simple arithmetic, comparisons, or conditions, vectorized alternatives are almost always better.
Common Replacements
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': range(10000)})
# Simple arithmetic
# Slow
df['B'] = df['A'].apply(lambda x: x + 10)
# Fast
df['B'] = df['A'] + 10
# Conditional logic
# Slow
df['C'] = df['A'].apply(lambda x: 'High' if x > 5000 else 'Low')
# Fast
df['C'] = np.where(df['A'] > 5000, 'High', 'Low')
Output:
A B C
0 0 10 Low
1 1 11 Low
2 2 12 Low
3 3 13 Low
4 4 14 Low
... ... ... ...
9995 9995 10005 High
9996 9996 10006 High
9997 9997 10007 High
9998 9998 10008 High
9999 9999 10009 High
[10000 rows x 3 columns]
Performance Comparison
The speed difference becomes substantial on larger DataFrames:
import pandas as pd
import time
df = pd.DataFrame({'A': range(100000)})
# Using apply
start = time.time()
for _ in range(10):
df['B'] = df['A'].apply(lambda x: x * 2)
apply_time = time.time() - start
# Using vectorized multiplication
start = time.time()
for _ in range(10):
df['B'] = df['A'] * 2
vector_time = time.time() - start
print(f"Apply: {apply_time:.4f}s")
print(f"Vectorized: {vector_time:.4f}s")
print(f"Vectorized is ~{apply_time / vector_time:.0f}x faster")
Output:
Apply: 0.4077s
Vectorized: 0.0026s
Vectorized is ~155x faster
Reserve .apply() for complex logic that genuinely cannot be expressed with vectorized operations. For simple math, comparisons, and string operations that have .str accessor equivalents, always use the vectorized approach first.
Good Use Cases for .apply()
Despite the performance considerations, .apply() is the right tool when your transformation involves complex logic, external libraries, or operations that do not have a vectorized equivalent:
Complex String Processing
import pandas as pd
import re
df = pd.DataFrame({'Text': ['Hello World', 'Foo Bar 123', 'Test!']})
def extract_alpha_word_count(text):
"""Count only alphabetic words, ignoring numbers and punctuation."""
return len(re.findall(r'\b[A-Za-z]+\b', text))
df['Word_Count'] = df['Text'].apply(extract_alpha_word_count)
print(df)
Output:
Text Word_Count
0 Hello World 2
1 Foo Bar 123 2
2 Test! 1
Multi-Column Business Logic
import pandas as pd
df = pd.DataFrame({
'Base_Price': [100, 250, 50],
'Quantity': [2, 1, 5],
'Member': [True, False, True]
})
def calculate_total(row):
"""Apply tiered discounts based on quantity and membership."""
subtotal = row['Base_Price'] * row['Quantity']
if row['Member']:
subtotal *= 0.9 # 10% member discount
if row['Quantity'] >= 5:
subtotal *= 0.95 # Additional 5% bulk discount
return round(subtotal, 2)
df['Total'] = df.apply(calculate_total, axis=1)
print(df)
Output:
Base_Price Quantity Member Total
0 100 2 True 180.00
1 250 1 False 250.00
2 50 5 True 213.75
This kind of nested conditional logic with multiple interacting conditions is difficult to express cleanly with np.where() or np.select(), making .apply() the practical choice.
Quick Reference
| Axis | Direction | Each Function Call Receives |
|---|---|---|
axis=0 (default) | Column-wise | Entire column as a Series |
axis=1 | Row-wise | Entire row as a Series |
| Operation Type | Recommended Approach |
|---|---|
| Simple arithmetic | Vectorized: df['A'] + df['B'] |
| Simple conditions | np.where() or np.select() |
| Complex multi-column logic | .apply(func, axis=1) |
| Built-in string methods | .str accessor (e.g., df['col'].str.upper()) |
| Custom string processing | .apply(func) |
- Use
.apply(func, axis=0)for column-wise operations andaxis=1for row-wise operations. - Pass extra arguments via
args=(val,)or as keyword arguments directly. - Always check whether a vectorized alternative exists before reaching for
.apply(), as vectorized operations are typically 10 to 100 times faster for simple transformations.