Python Polars: How to Add a Column with Numerical Values in Polars
Polars is a high-performance DataFrame library written in Rust with Python bindings, designed for fast and memory-efficient data manipulation. A common operation when building datasets is adding new columns with numerical values, whether it's a constant, a computed value, or data from an external source.
This guide covers the different ways to add numerical columns to a Polars DataFrame using the .with_columns() method.
Installation
Install Polars via pip if you haven't already:
pip install polars
Setting Up a Sample DataFrame
All examples in this guide use the following base DataFrame:
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"age": [25, 30, 35, 28],
"salary": [50000, 45000, 60000, 52000]
})
print(df)
Output:
shape: (4, 3)
┌─────────┬─────┬────────┐
│ name ┆ age ┆ salary │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═════╪════════╡
│ Alice ┆ 25 ┆ 50000 │
│ Bob ┆ 30 ┆ 45000 │
│ Charlie ┆ 35 ┆ 60000 │
│ Diana ┆ 28 ┆ 52000 │
└─────────┴─────┴────────┘
Adding a Constant Value Column
Use pl.lit() (literal) to add a column where every row contains the same value:
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35]
})
new_df = df.with_columns(
pl.lit(50000).alias("salary"),
pl.lit(1).alias("department_id")
)
print(new_df)
Output:
shape: (3, 4)
┌─────────┬─────┬────────┬───────────────┐
│ name ┆ age ┆ salary ┆ department_id │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i32 ┆ i32 │
╞═════════╪═════╪════════╪═══════════════╡
│ Alice ┆ 25 ┆ 50000 ┆ 1 │
│ Bob ┆ 30 ┆ 50000 ┆ 1 │
│ Charlie ┆ 35 ┆ 50000 ┆ 1 │
└─────────┴─────┴────────┴───────────────┘
This is useful for adding default values, flags, or metadata to every row.
Adding a Column from a List of Values
Pass a pl.Series to add a column with different values for each row:
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35]
})
new_df = df.with_columns(
pl.Series("score", [85, 92, 78]),
pl.Series("bonus", [5000, 3000, 7000])
)
print(new_df)
Output:
shape: (3, 4)
┌─────────┬─────┬───────┬───────┐
│ name ┆ age ┆ score ┆ bonus │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════════╪═════╪═══════╪═══════╡
│ Alice ┆ 25 ┆ 85 ┆ 5000 │
│ Bob ┆ 30 ┆ 92 ┆ 3000 │
│ Charlie ┆ 35 ┆ 78 ┆ 7000 │
└─────────┴─────┴───────┴───────┘
The list passed to pl.Series must have the same number of elements as rows in the DataFrame. A length mismatch will raise an error:
# DataFrame has 3 rows, but list has 4 values
pl.Series("score", [85, 92, 78, 88])
# ShapeError: length of new column must match DataFrame height
Adding a Column Based on Existing Columns
Create new numerical columns by performing calculations on existing columns using pl.col():
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"salary": [50000, 45000, 60000, 52000]
})
new_df = df.with_columns(
(pl.col("salary") * 0.10).alias("tax"),
(pl.col("salary") * 0.05).alias("bonus"),
(pl.col("salary") - pl.col("salary") * 0.10).alias("net_salary")
)
print(new_df)
Output:
shape: (4, 5)
┌─────────┬────────┬────────┬────────┬────────────┐
│ name ┆ salary ┆ tax ┆ bonus ┆ net_salary │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════╪════════╪════════╪════════╪════════════╡
│ Alice ┆ 50000 ┆ 5000.0 ┆ 2500.0 ┆ 45000.0 │
│ Bob ┆ 45000 ┆ 4500.0 ┆ 2250.0 ┆ 40500.0 │
│ Charlie ┆ 60000 ┆ 6000.0 ┆ 3000.0 ┆ 54000.0 │
│ Diana ┆ 52000 ┆ 5200.0 ┆ 2600.0 ┆ 46800.0 │
└─────────┴────────┴────────┴────────┴────────────┘
Combining Multiple Columns
You can reference and combine multiple columns in a single expression:
import polars as pl
df = pl.DataFrame({
"product": ["A", "B", "C"],
"price": [100, 150, 200],
"quantity": [5, 3, 8]
})
new_df = df.with_columns(
(pl.col("price") * pl.col("quantity")).alias("total_revenue"),
(pl.col("price") * pl.col("quantity") * 0.08).alias("sales_tax")
)
print(new_df)
Output:
shape: (3, 5)
┌─────────┬───────┬──────────┬───────────────┬───────────┐
│ product ┆ price ┆ quantity ┆ total_revenue ┆ sales_tax │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ f64 │
╞═════════╪═══════╪══════════╪═══════════════╪═══════════╡
│ A ┆ 100 ┆ 5 ┆ 500 ┆ 40.0 │
│ B ┆ 150 ┆ 3 ┆ 450 ┆ 36.0 │
│ C ┆ 200 ┆ 8 ┆ 1600 ┆ 128.0 │
└─────────┴───────┴──────────┴───────────────┴───────────┘
Adding a Conditional Numerical Column
Use pl.when().then().otherwise() to assign numerical values based on conditions:
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"salary": [50000, 45000, 60000, 52000]
})
new_df = df.with_columns(
pl.when(pl.col("salary") >= 55000)
.then(pl.lit(10))
.when(pl.col("salary") >= 48000)
.then(pl.lit(7))
.otherwise(pl.lit(5))
.alias("bonus_percentage")
)
print(new_df)
Output:
shape: (4, 3)
┌─────────┬────────┬──────────────────┐
│ name ┆ salary ┆ bonus_percentage │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i32 │
╞═════════╪════════╪══════════════════╡
│ Alice ┆ 50000 ┆ 7 │
│ Bob ┆ 45000 ┆ 5 │
│ Charlie ┆ 60000 ┆ 10 │
│ Diana ┆ 52000 ┆ 7 │
└─────────┴────────┴──────────────────┘
Adding a Row Number Column
Add a sequential index or row number column:
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"score": [85, 92, 78]
})
new_df = df.with_columns(
pl.arange(1, len(df) + 1, eager=True).alias("row_number")
)
print(new_df)
Output:
shape: (3, 3)
┌─────────┬───────┬────────────┐
│ name ┆ score ┆ row_number │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═══════╪════════════╡
│ Alice ┆ 85 ┆ 1 │
│ Bob ┆ 92 ┆ 2 │
│ Charlie ┆ 78 ┆ 3 │
└─────────┴───────┴────────────┘
Generating Numerical Columns with NumPy
For more complex numerical data like random values or mathematical sequences, combine Polars with NumPy:
import polars as pl
import numpy as np
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"age": [25, 30, 35, 28]
})
np.random.seed(42)
new_df = df.with_columns(
pl.Series("random_score", np.random.randint(50, 100, size=len(df))),
pl.Series("weight", np.round(np.random.uniform(55.0, 90.0, size=len(df)), 1))
)
print(new_df)
Output:
shape: (4, 4)
┌─────────┬─────┬──────────────┬────────┐
│ name ┆ age ┆ random_score ┆ weight │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i32 ┆ f64 │
╞═════════╪═════╪══════════════╪════════╡
│ Alice ┆ 25 ┆ 88 ┆ 82.3 │
│ Bob ┆ 30 ┆ 78 ┆ 75.9 │
│ Charlie ┆ 35 ┆ 64 ┆ 70.6 │
│ Diana ┆ 28 ┆ 92 ┆ 58.5 │
└─────────┴─────┴──────────────┴────────┘
Quick Reference
| Task | Expression |
|---|---|
| Constant value | pl.lit(100).alias("col") |
| From a list | pl.Series("col", [1, 2, 3]) |
| From existing column | (pl.col("a") * 2).alias("col") |
| Combine columns | (pl.col("a") + pl.col("b")).alias("col") |
| Conditional value | pl.when(cond).then(val).otherwise(val).alias("col") |
| Row numbers | pl.arange(1, n+1, eager=True).alias("col") |
| From NumPy | pl.Series("col", np.array([...])) |
Conclusion
Adding numerical columns to a Polars DataFrame is straightforward and flexible with the .with_columns() method.
Use pl.lit() for constant values, pl.Series() for explicit lists, and pl.col() expressions for computed values based on existing data. For conditional assignments, pl.when().then().otherwise() provides clean, readable logic.
All these operations leverage Polars' Rust-based execution engine, making them significantly faster than equivalent pandas operations on large datasets.