Python Polars: How to Append or Concatenate DataFrames in Polars in Python
Polars is a high-performance DataFrame library built in Rust, designed for efficient data processing in Python. One of the most common tasks when working with multiple datasets is combining them, either by stacking rows vertically or by placing columns side by side horizontally. Polars provides several optimized methods for both scenarios, each suited to different use cases and schema requirements.
This guide covers every major approach to appending and concatenating DataFrames in Polars, including how to handle mismatched schemas gracefully.
Vertical Concatenation with pl.concat()
The most common way to combine DataFrames in Polars is pl.concat(), which stacks DataFrames on top of each other by default. Both DataFrames must share the same column names and data types:
import polars as pl
df1 = pl.DataFrame({"user": ["Alice"], "score": [95]})
df2 = pl.DataFrame({"user": ["Bob"], "score": [88]})
combined = pl.concat([df1, df2])
print(combined)
Output:
shape: (2, 2)
┌───────┬───────┐
│ user ┆ score │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ Alice ┆ 95 │
│ Bob ┆ 88 │
└───────┴───────┘
By default, pl.concat() enforces a strict schema check. Column names and data types must match across all DataFrames. If there is a mismatch, Polars raises an error immediately rather than silently producing inconsistent data.
Concatenating Multiple DataFrames at Once
pl.concat() accepts a list of any length, so you can combine many DataFrames in a single call. This is common when loading data from multiple files or processing data in batches:
import polars as pl
dfs = [
pl.DataFrame({"id": [1], "value": [100]}),
pl.DataFrame({"id": [2], "value": [200]}),
pl.DataFrame({"id": [3], "value": [300]}),
]
result = pl.concat(dfs)
print(result)
Output:
shape: (3, 2)
┌─────┬───────┐
│ id ┆ value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═══════╡
│ 1 ┆ 100 │
│ 2 ┆ 200 │
│ 3 ┆ 300 │
└─────┴───────┘
This is especially useful in combination with list comprehensions. For example, loading and concatenating a directory of CSV files:
import polars as pl
from pathlib import Path
# Example: combine all CSV files in a directory
dfs = [pl.read_csv(f) for f in Path("data/").glob("*.csv")]
combined = pl.concat(dfs)
Horizontal Concatenation: Adding Columns Side by Side
To place DataFrames next to each other column-wise, pass how="horizontal":
import polars as pl
df_names = pl.DataFrame({"name": ["Alice", "Bob"]})
df_metrics = pl.DataFrame({"id": [101, 102], "age": [25, 30]})
result = pl.concat([df_names, df_metrics], how="horizontal")
print(result)
Output:
shape: (2, 3)
┌───────┬─────┬─────┐
│ name ┆ id ┆ age │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ Alice ┆ 101 ┆ 25 │
│ Bob ┆ 102 ┆ 30 │
└───────┴─────┴─────┘
Horizontal concatenation aligns rows by position, not by any key column. If you need to match rows based on a shared key like user ID, use .join() instead. Using horizontal concatenation with misaligned data will silently pair the wrong rows together.
Fast Row Appending with .vstack()
When you need to append one DataFrame to another and both have exactly the same schema (column names, types, and order), .vstack() provides maximum performance. It attempts to share memory where possible, making it extremely efficient for binary append operations:
import polars as pl
df1 = pl.DataFrame({"user": ["Alice"], "score": [95]})
df2 = pl.DataFrame({"user": ["Bob"], "score": [88]})
result = df1.vstack(df2)
print(result)
Output:
shape: (2, 2)
┌───────┬───────┐
│ user ┆ score │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ Alice ┆ 95 │
│ Bob ┆ 88 │
└───────┴───────┘
When to Use .vstack() vs pl.concat()
Both produce the same result for two DataFrames with matching schemas. The key differences are:
.vstack()operates on exactly two DataFrames and is optimized for repeated binary appends.pl.concat()accepts a list of any number of DataFrames and is the more general-purpose tool.
For combining more than two DataFrames, pl.concat() is the better choice. For appending a single DataFrame in a loop or streaming context, .vstack() is ideal.
.vstack() requires an exact schema match: identical column names, identical data types, and identical column order. If any of these differ, it raises an error. There is no option for lenient matching.
Handling Schema Mismatches with Diagonal Concatenation
In real-world scenarios, DataFrames do not always share the same columns. Perhaps one source includes a column that another does not. The how="diagonal" strategy handles this by including all columns from all DataFrames and filling missing values with null:
import polars as pl
df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
df2 = pl.DataFrame({"a": [5, 6], "c": [7, 8]})
result = pl.concat([df1, df2], how="diagonal")
print(result)
Output:
shape: (4, 3)
┌─────┬──────┬──────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ 1 ┆ 3 ┆ null │
│ 2 ┆ 4 ┆ null │
│ 5 ┆ null ┆ 7 │
│ 6 ┆ null ┆ 8 │
└─────┴──────┴──────┘
Column a exists in both DataFrames, so its values are stacked normally. Column b only exists in df1, so df2 rows get null in that column. Column c only exists in df2, so df1 rows get null there.
This is particularly useful when combining data from different sources or API responses where the schema may evolve over time.
What Happens When Schemas Do Not Match (Without Diagonal)
If you try to vertically concatenate DataFrames with different columns using the default behavior, Polars raises an error:
import polars as pl
df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
df2 = pl.DataFrame({"a": [5, 6], "c": [7, 8]})
try:
result = pl.concat([df1, df2])
except Exception as e:
print(f"Error: {e}")
Output:
Error: unable to vstack, column names don't match: "b" and "c"
This strict behavior is intentional. It prevents you from accidentally combining unrelated data. If the mismatch is expected, switch to how="diagonal".
Quick Reference
| Goal | Method | Notes |
|---|---|---|
| Stack rows (same schema) | pl.concat([df1, df2]) | Default vertical concatenation |
| Stack rows (different columns) | pl.concat([...], how="diagonal") | Fills missing columns with null |
| Add columns side by side | pl.concat([...], how="horizontal") | Rows must be positionally aligned |
| Fast binary append | df1.vstack(df2) | Exact schema match required |
Summary
Polars provides flexible and performant options for combining DataFrames.
- Use
pl.concat()as your primary tool for stacking rows vertically (default) or columns horizontally (how="horizontal"). - When DataFrames have different columns, use
how="diagonal"to include all columns withnullfilling for missing values. - For maximum performance when appending a single DataFrame with an identical schema, use
.vstack().
In all cases, Polars enforces strict schema checks by default, which helps catch data inconsistencies early rather than letting them propagate silently through your pipeline.