Python Polars: How to Add a Column to a Polars DataFrame Using .with_columns()
Polars is a high-performance DataFrame library for Python that's becoming a popular alternative to pandas, especially for large datasets. One of its most frequently used methods is .with_columns(), which lets you add one or more new columns to a DataFrame.
Unlike pandas, Polars follows an immutable design: .with_columns() returns a new DataFrame with the added columns rather than modifying the original. This makes operations safer, more predictable, and easier to chain together.
This guide covers the most common ways to add columns using .with_columns(), from simple constant values to conditional logic and custom functions.
Installation
If you haven't installed Polars yet:
pip install polars
Basic Syntax
The .with_columns() method accepts one or more expressions that define new columns:
import polars as pl
df = pl.DataFrame({
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35]
})
# Add a new column derived from an existing one
new_df = df.with_columns(
(pl.col("Age") + 5).alias("Age_in_5_Years")
)
print(new_df)
Output:
shape: (3, 3)
┌─────────┬─────┬────────────────┐
│ Name ┆ Age ┆ Age_in_5_Years │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═════╪════════════════╡
│ Alice ┆ 25 ┆ 30 │
│ Bob ┆ 30 ┆ 35 │
│ Charlie ┆ 35 ┆ 40 │
└─────────┴─────┴────────────────┘
Key elements:
pl.col("Age"): references the existingAgecolumn.+ 5: applies an arithmetic operation..alias("Age_in_5_Years"): names the new column.
The original df is not modified. .with_columns() always returns a new DataFrame, preserving the original data.
Adding a Constant Value Column
Add a column where every row has the same value using pl.lit() (literal):
import polars as pl
df = pl.DataFrame({
"Product": ["Laptop", "Phone", "Tablet"],
"Price": [999, 699, 449]
})
new_df = df.with_columns(
pl.lit("USD").alias("Currency"),
pl.lit(True).alias("In_Stock")
)
print(new_df)
Output:
shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Product ┆ Price ┆ Currency ┆ In_Stock │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ bool │
╞═════════╪═══════╪══════════╪══════════╡
│ Laptop ┆ 999 ┆ USD ┆ true │
│ Phone ┆ 699 ┆ USD ┆ true │
│ Tablet ┆ 449 ┆ USD ┆ true │
└─────────┴───────┴──────────┴──────────┘
Always use pl.lit() for constant values. Passing a raw Python value directly will raise an error in most contexts.
Creating a Column from Multiple Existing Columns
Perform operations that combine data from multiple columns:
import polars as pl
df = pl.DataFrame({
"Product": ["Laptop", "Phone", "Tablet"],
"Price": [999, 699, 449],
"Quantity": [2, 5, 3]
})
new_df = df.with_columns(
(pl.col("Price") * pl.col("Quantity")).alias("Total_Cost")
)
print(new_df)
Output:
shape: (3, 4)
┌─────────┬───────┬──────────┬────────────┐
│ Product ┆ Price ┆ Quantity ┆ Total_Cost │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════════╪═══════╪══════════╪════════════╡
│ Laptop ┆ 999 ┆ 2 ┆ 1998 │
│ Phone ┆ 699 ┆ 5 ┆ 3495 │
│ Tablet ┆ 449 ┆ 3 ┆ 1347 │
└─────────┴───────┴──────────┴────────────┘
Conditional Column Creation
Use pl.when().then().otherwise() to create columns based on conditions, similar to SQL's CASE WHEN or pandas' np.where():
import polars as pl
df = pl.DataFrame({
"Product": ["Laptop", "Phone", "Tablet"],
"Price": [999, 699, 449]
})
new_df = df.with_columns(
pl.when(pl.col("Price") > 500)
.then(pl.lit("Premium"))
.otherwise(pl.lit("Budget"))
.alias("Category")
)
print(new_df)
Output:
shape: (3, 3)
┌─────────┬───────┬──────────┐
│ Product ┆ Price ┆ Category │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╡
│ Laptop ┆ 999 ┆ Premium │
│ Phone ┆ 699 ┆ Premium │
│ Tablet ┆ 449 ┆ Budget │
└─────────┴───────┴──────────┘
Multiple Conditions
Chain multiple .when() clauses for more complex logic:
import polars as pl
df = pl.DataFrame({
"Name": ["Alice", "Bob", "Charlie", "Diana"],
"Score": [95, 72, 58, 83]
})
new_df = df.with_columns(
pl.when(pl.col("Score") >= 90).then(pl.lit("A"))
.when(pl.col("Score") >= 80).then(pl.lit("B"))
.when(pl.col("Score") >= 70).then(pl.lit("C"))
.otherwise(pl.lit("F"))
.alias("Grade")
)
print(new_df)
Output:
shape: (4, 3)
┌─────────┬───────┬───────┐
│ Name ┆ Score ┆ Grade │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════╪═══════╪═══════╡
│ Alice ┆ 95 ┆ A │
│ Bob ┆ 72 ┆ C │
│ Charlie ┆ 58 ┆ F │
│ Diana ┆ 83 ┆ B │
└─────────┴───────┴─── ────┘
Adding Multiple Columns at Once
Pass multiple expressions to .with_columns() to add several columns in a single operation:
import polars as pl
df = pl.DataFrame({
"Product": ["Laptop", "Phone", "Tablet"],
"Price": [999, 699, 449]
})
new_df = df.with_columns(
(pl.col("Price") * 0.1).alias("Tax"),
(pl.col("Price") * 1.1).alias("Price_with_Tax"),
(pl.col("Price") * 0.9).alias("Discounted_Price")
)
print(new_df)
Output:
shape: (3, 5)
┌─────────┬───────┬──────┬────────────────┬──────────────────┐
│ Product ┆ Price ┆ Tax ┆ Price_with_Tax ┆ Discounted_Price │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════╪═══════╪══════╪════════════════╪══════════════════╡
│ Laptop ┆ 999 ┆ 99.9 ┆ 1098.9 ┆ 899.1 │
│ Phone ┆ 699 ┆ 69.9 ┆ 768.9 ┆ 629.1 │
│ Tablet ┆ 449 ┆ 44.9 ┆ 493.9 ┆ 404.1 │
└─────────┴───────┴──────┴────────────────┴──────────────────┘
Adding multiple columns in a single .with_columns() call is more efficient than chaining multiple calls, as Polars can optimize the operations together.
Using Custom Functions with .map_elements()
For complex transformations that can't be expressed with built-in expressions, use .map_elements() to apply a custom Python function:
import polars as pl
df = pl.DataFrame({
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35]
})
def age_category(age):
if age < 30:
return "Young"
elif age < 35:
return "Middle"
else:
return "Senior"
new_df = df.with_columns(
pl.col("Age").map_elements(age_category, return_dtype=pl.Utf8).alias("Category")
)
print(new_df)
Output:
shape: (3, 3)
┌─────────┬─────┬──────────┐
│ Name ┆ Age ┆ Category │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str │
╞═════════╪═════╪══════════╡
│ Alice ┆ 25 ┆ Young │
│ Bob ┆ 30 ┆ Middle │
│ Charlie ┆ 35 ┆ Senior │
└─────────┴─────┴──────────┘
.map_elements() runs a Python function row by row, which is significantly slower than native Polars expressions. Whenever possible, use built-in expressions like pl.when().then().otherwise() instead: they run in optimized Rust and are orders of magnitude faster.
# Slower: Python function
pl.col("Age").map_elements(age_category, return_dtype=pl.Utf8)
# Faster: Native Polars expressions
pl.when(pl.col("Age") < 30).then(pl.lit("Young"))
.when(pl.col("Age") < 35).then(pl.lit("Middle"))
.otherwise(pl.lit("Senior"))
Using String and Date Operations
Polars provides rich expression APIs for strings, dates, and other types:
import polars as pl
df = pl.DataFrame({
"Name": ["Alice Smith", "Bob Jones", "Charlie Brown"],
"Email": ["alice@test.com", "bob@test.com", "charlie@test.com"]
})
new_df = df.with_columns(
pl.col("Name").str.to_uppercase().alias("Name_Upper"),
pl.col("Name").str.split(" ").list.first().alias("First_Name"),
pl.col("Email").str.contains("test").alias("Is_Test_Email")
)
print(new_df)
Output:
shape: (3, 5)
┌───────────────┬──────────────────┬───────────────┬────────────┬───────────────┐
│ Name ┆ Email ┆ Name_Upper ┆ First_Name ┆ Is_Test_Email │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ bool │
╞═══════════════╪══════════════════╪═══════════════╪════════════╪═══════════════╡
│ Alice Smith ┆ alice@test.com ┆ ALICE SMITH ┆ Alice ┆ true │
│ Bob Jones ┆ bob@test.com ┆ BOB JONES ┆ Bob ┆ true │
│ Charlie Brown ┆ charlie@test.com ┆ CHARLIE BROWN ┆ Charlie ┆ true │
└───────────────┴──────────────────┴───────────────┴────────────┴───────────────┘
Quick Reference
| Task | Expression |
|---|---|
| Column from existing | (pl.col("A") + pl.col("B")).alias("C") |
| Constant value | pl.lit("value").alias("C") |
| Conditional | pl.when(cond).then(val).otherwise(val).alias("C") |
| Custom function | pl.col("A").map_elements(fn, return_dtype=...).alias("C") |
| String operations | pl.col("A").str.to_uppercase().alias("C") |
| Multiple columns | Pass multiple expressions separated by commas |
Conclusion
The .with_columns() method is one of the most powerful tools in Polars for DataFrame manipulation.
- It supports constant values, arithmetic operations, conditional logic, string transformations, and custom functions, all while maintaining Polars' immutable design.
- For best performance, prefer native Polars expressions over
.map_elements()with Python functions. - By combining multiple column additions into a single
.with_columns()call, you let Polars optimize the execution plan for maximum efficiency.