Skip to main content

Python Polars: How to Add a Column with Numerical Values in Polars

Polars is a high-performance DataFrame library written in Rust with Python bindings, designed for fast and memory-efficient data manipulation. A common operation when building datasets is adding new columns with numerical values, whether it's a constant, a computed value, or data from an external source.

This guide covers the different ways to add numerical columns to a Polars DataFrame using the .with_columns() method.

Installation

Install Polars via pip if you haven't already:

pip install polars

Setting Up a Sample DataFrame

All examples in this guide use the following base DataFrame:

import polars as pl

df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"age": [25, 30, 35, 28],
"salary": [50000, 45000, 60000, 52000]
})

print(df)

Output:

shape: (4, 3)
┌─────────┬─────┬────────┐
│ name ┆ age ┆ salary │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═════╪════════╡
│ Alice ┆ 25 ┆ 50000 │
│ Bob ┆ 30 ┆ 45000 │
│ Charlie ┆ 35 ┆ 60000 │
│ Diana ┆ 28 ┆ 52000 │
└─────────┴─────┴────────┘

Adding a Constant Value Column

Use pl.lit() (literal) to add a column where every row contains the same value:

import polars as pl

df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35]
})

new_df = df.with_columns(
pl.lit(50000).alias("salary"),
pl.lit(1).alias("department_id")
)

print(new_df)

Output:

shape: (3, 4)
┌─────────┬─────┬────────┬───────────────┐
│ name ┆ age ┆ salary ┆ department_id │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i32 ┆ i32 │
╞═════════╪═════╪════════╪═══════════════╡
│ Alice ┆ 25 ┆ 50000 ┆ 1 │
│ Bob ┆ 30 ┆ 50000 ┆ 1 │
│ Charlie ┆ 35 ┆ 50000 ┆ 1 │
└─────────┴─────┴────────┴───────────────┘
note

This is useful for adding default values, flags, or metadata to every row.

Adding a Column from a List of Values

Pass a pl.Series to add a column with different values for each row:

import polars as pl

df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35]
})

new_df = df.with_columns(
pl.Series("score", [85, 92, 78]),
pl.Series("bonus", [5000, 3000, 7000])
)

print(new_df)

Output:

shape: (3, 4)
┌─────────┬─────┬───────┬───────┐
│ name ┆ age ┆ score ┆ bonus │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════════╪═════╪═══════╪═══════╡
│ Alice ┆ 25 ┆ 85 ┆ 5000 │
│ Bob ┆ 30 ┆ 92 ┆ 3000 │
│ Charlie ┆ 35 ┆ 78 ┆ 7000 │
└─────────┴─────┴───────┴───────┘
List length must match DataFrame height

The list passed to pl.Series must have the same number of elements as rows in the DataFrame. A length mismatch will raise an error:

# DataFrame has 3 rows, but list has 4 values
pl.Series("score", [85, 92, 78, 88])
# ShapeError: length of new column must match DataFrame height

Adding a Column Based on Existing Columns

Create new numerical columns by performing calculations on existing columns using pl.col():

import polars as pl

df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"salary": [50000, 45000, 60000, 52000]
})

new_df = df.with_columns(
(pl.col("salary") * 0.10).alias("tax"),
(pl.col("salary") * 0.05).alias("bonus"),
(pl.col("salary") - pl.col("salary") * 0.10).alias("net_salary")
)

print(new_df)

Output:

shape: (4, 5)
┌─────────┬────────┬────────┬────────┬────────────┐
│ name ┆ salary ┆ tax ┆ bonus ┆ net_salary │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════╪════════╪════════╪════════╪════════════╡
│ Alice ┆ 50000 ┆ 5000.0 ┆ 2500.0 ┆ 45000.0 │
│ Bob ┆ 45000 ┆ 4500.0 ┆ 2250.0 ┆ 40500.0 │
│ Charlie ┆ 60000 ┆ 6000.0 ┆ 3000.0 ┆ 54000.0 │
│ Diana ┆ 52000 ┆ 5200.0 ┆ 2600.0 ┆ 46800.0 │
└─────────┴────────┴────────┴────────┴────────────┘

Combining Multiple Columns

You can reference and combine multiple columns in a single expression:

import polars as pl

df = pl.DataFrame({
"product": ["A", "B", "C"],
"price": [100, 150, 200],
"quantity": [5, 3, 8]
})

new_df = df.with_columns(
(pl.col("price") * pl.col("quantity")).alias("total_revenue"),
(pl.col("price") * pl.col("quantity") * 0.08).alias("sales_tax")
)

print(new_df)

Output:

shape: (3, 5)
┌─────────┬───────┬──────────┬───────────────┬───────────┐
│ product ┆ price ┆ quantity ┆ total_revenue ┆ sales_tax │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ f64 │
╞═════════╪═══════╪══════════╪═══════════════╪═══════════╡
│ A ┆ 100 ┆ 5 ┆ 500 ┆ 40.0 │
│ B ┆ 150 ┆ 3 ┆ 450 ┆ 36.0 │
│ C ┆ 200 ┆ 8 ┆ 1600 ┆ 128.0 │
└─────────┴───────┴──────────┴───────────────┴───────────┘

Adding a Conditional Numerical Column

Use pl.when().then().otherwise() to assign numerical values based on conditions:

import polars as pl

df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"salary": [50000, 45000, 60000, 52000]
})

new_df = df.with_columns(
pl.when(pl.col("salary") >= 55000)
.then(pl.lit(10))
.when(pl.col("salary") >= 48000)
.then(pl.lit(7))
.otherwise(pl.lit(5))
.alias("bonus_percentage")
)

print(new_df)

Output:

shape: (4, 3)
┌─────────┬────────┬──────────────────┐
│ name ┆ salary ┆ bonus_percentage │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i32 │
╞═════════╪════════╪══════════════════╡
│ Alice ┆ 50000 ┆ 7 │
│ Bob ┆ 45000 ┆ 5 │
│ Charlie ┆ 60000 ┆ 10 │
│ Diana ┆ 52000 ┆ 7 │
└─────────┴────────┴──────────────────┘

Adding a Row Number Column

Add a sequential index or row number column:

import polars as pl

df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"score": [85, 92, 78]
})

new_df = df.with_columns(
pl.arange(1, len(df) + 1, eager=True).alias("row_number")
)

print(new_df)

Output:

shape: (3, 3)
┌─────────┬───────┬────────────┐
│ name ┆ score ┆ row_number │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═══════╪════════════╡
│ Alice ┆ 85 ┆ 1 │
│ Bob ┆ 92 ┆ 2 │
│ Charlie ┆ 78 ┆ 3 │
└─────────┴───────┴────────────┘

Generating Numerical Columns with NumPy

For more complex numerical data like random values or mathematical sequences, combine Polars with NumPy:

import polars as pl
import numpy as np

df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana"],
"age": [25, 30, 35, 28]
})

np.random.seed(42)

new_df = df.with_columns(
pl.Series("random_score", np.random.randint(50, 100, size=len(df))),
pl.Series("weight", np.round(np.random.uniform(55.0, 90.0, size=len(df)), 1))
)

print(new_df)

Output:

shape: (4, 4)
┌─────────┬─────┬──────────────┬────────┐
│ name ┆ age ┆ random_score ┆ weight │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i32 ┆ f64 │
╞═════════╪═════╪══════════════╪════════╡
│ Alice ┆ 25 ┆ 88 ┆ 82.3 │
│ Bob ┆ 30 ┆ 78 ┆ 75.9 │
│ Charlie ┆ 35 ┆ 64 ┆ 70.6 │
│ Diana ┆ 28 ┆ 92 ┆ 58.5 │
└─────────┴─────┴──────────────┴────────┘

Quick Reference

TaskExpression
Constant valuepl.lit(100).alias("col")
From a listpl.Series("col", [1, 2, 3])
From existing column(pl.col("a") * 2).alias("col")
Combine columns(pl.col("a") + pl.col("b")).alias("col")
Conditional valuepl.when(cond).then(val).otherwise(val).alias("col")
Row numberspl.arange(1, n+1, eager=True).alias("col")
From NumPypl.Series("col", np.array([...]))

Conclusion

Adding numerical columns to a Polars DataFrame is straightforward and flexible with the .with_columns() method.

Use pl.lit() for constant values, pl.Series() for explicit lists, and pl.col() expressions for computed values based on existing data. For conditional assignments, pl.when().then().otherwise() provides clean, readable logic.

All these operations leverage Polars' Rust-based execution engine, making them significantly faster than equivalent pandas operations on large datasets.