Skip to main content

Python Polars: How to Append or Concatenate DataFrames in Polars in Python

Polars is a high-performance DataFrame library built in Rust, designed for efficient data processing in Python. One of the most common tasks when working with multiple datasets is combining them, either by stacking rows vertically or by placing columns side by side horizontally. Polars provides several optimized methods for both scenarios, each suited to different use cases and schema requirements.

This guide covers every major approach to appending and concatenating DataFrames in Polars, including how to handle mismatched schemas gracefully.

Vertical Concatenation with pl.concat()

The most common way to combine DataFrames in Polars is pl.concat(), which stacks DataFrames on top of each other by default. Both DataFrames must share the same column names and data types:

import polars as pl

df1 = pl.DataFrame({"user": ["Alice"], "score": [95]})
df2 = pl.DataFrame({"user": ["Bob"], "score": [88]})

combined = pl.concat([df1, df2])
print(combined)

Output:

shape: (2, 2)
┌───────┬───────┐
│ user ┆ score │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ Alice ┆ 95 │
│ Bob ┆ 88 │
└───────┴───────┘
info

By default, pl.concat() enforces a strict schema check. Column names and data types must match across all DataFrames. If there is a mismatch, Polars raises an error immediately rather than silently producing inconsistent data.

Concatenating Multiple DataFrames at Once

pl.concat() accepts a list of any length, so you can combine many DataFrames in a single call. This is common when loading data from multiple files or processing data in batches:

import polars as pl

dfs = [
pl.DataFrame({"id": [1], "value": [100]}),
pl.DataFrame({"id": [2], "value": [200]}),
pl.DataFrame({"id": [3], "value": [300]}),
]

result = pl.concat(dfs)
print(result)

Output:

shape: (3, 2)
┌─────┬───────┐
│ id ┆ value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═══════╡
│ 1 ┆ 100 │
│ 2 ┆ 200 │
│ 3 ┆ 300 │
└─────┴───────┘

This is especially useful in combination with list comprehensions. For example, loading and concatenating a directory of CSV files:

import polars as pl
from pathlib import Path

# Example: combine all CSV files in a directory
dfs = [pl.read_csv(f) for f in Path("data/").glob("*.csv")]
combined = pl.concat(dfs)

Horizontal Concatenation: Adding Columns Side by Side

To place DataFrames next to each other column-wise, pass how="horizontal":

import polars as pl

df_names = pl.DataFrame({"name": ["Alice", "Bob"]})
df_metrics = pl.DataFrame({"id": [101, 102], "age": [25, 30]})

result = pl.concat([df_names, df_metrics], how="horizontal")
print(result)

Output:

shape: (2, 3)
┌───────┬─────┬─────┐
│ name ┆ id ┆ age │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ Alice ┆ 101 ┆ 25 │
│ Bob ┆ 102 ┆ 30 │
└───────┴─────┴─────┘
tip

Horizontal concatenation aligns rows by position, not by any key column. If you need to match rows based on a shared key like user ID, use .join() instead. Using horizontal concatenation with misaligned data will silently pair the wrong rows together.

Fast Row Appending with .vstack()

When you need to append one DataFrame to another and both have exactly the same schema (column names, types, and order), .vstack() provides maximum performance. It attempts to share memory where possible, making it extremely efficient for binary append operations:

import polars as pl

df1 = pl.DataFrame({"user": ["Alice"], "score": [95]})
df2 = pl.DataFrame({"user": ["Bob"], "score": [88]})

result = df1.vstack(df2)
print(result)

Output:

shape: (2, 2)
┌───────┬───────┐
│ user ┆ score │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════╪═══════╡
│ Alice ┆ 95 │
│ Bob ┆ 88 │
└───────┴───────┘

When to Use .vstack() vs pl.concat()

Both produce the same result for two DataFrames with matching schemas. The key differences are:

  • .vstack() operates on exactly two DataFrames and is optimized for repeated binary appends.
  • pl.concat() accepts a list of any number of DataFrames and is the more general-purpose tool.

For combining more than two DataFrames, pl.concat() is the better choice. For appending a single DataFrame in a loop or streaming context, .vstack() is ideal.

warning

.vstack() requires an exact schema match: identical column names, identical data types, and identical column order. If any of these differ, it raises an error. There is no option for lenient matching.

Handling Schema Mismatches with Diagonal Concatenation

In real-world scenarios, DataFrames do not always share the same columns. Perhaps one source includes a column that another does not. The how="diagonal" strategy handles this by including all columns from all DataFrames and filling missing values with null:

import polars as pl

df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
df2 = pl.DataFrame({"a": [5, 6], "c": [7, 8]})

result = pl.concat([df1, df2], how="diagonal")
print(result)

Output:

shape: (4, 3)
┌─────┬──────┬──────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ 1 ┆ 3 ┆ null │
│ 2 ┆ 4 ┆ null │
│ 5 ┆ null ┆ 7 │
│ 6 ┆ null ┆ 8 │
└─────┴──────┴──────┘

Column a exists in both DataFrames, so its values are stacked normally. Column b only exists in df1, so df2 rows get null in that column. Column c only exists in df2, so df1 rows get null there.

This is particularly useful when combining data from different sources or API responses where the schema may evolve over time.

What Happens When Schemas Do Not Match (Without Diagonal)

If you try to vertically concatenate DataFrames with different columns using the default behavior, Polars raises an error:

import polars as pl

df1 = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
df2 = pl.DataFrame({"a": [5, 6], "c": [7, 8]})

try:
result = pl.concat([df1, df2])
except Exception as e:
print(f"Error: {e}")

Output:

Error: unable to vstack, column names don't match: "b" and "c"

This strict behavior is intentional. It prevents you from accidentally combining unrelated data. If the mismatch is expected, switch to how="diagonal".

Quick Reference

GoalMethodNotes
Stack rows (same schema)pl.concat([df1, df2])Default vertical concatenation
Stack rows (different columns)pl.concat([...], how="diagonal")Fills missing columns with null
Add columns side by sidepl.concat([...], how="horizontal")Rows must be positionally aligned
Fast binary appenddf1.vstack(df2)Exact schema match required

Summary

Polars provides flexible and performant options for combining DataFrames.

  • Use pl.concat() as your primary tool for stacking rows vertically (default) or columns horizontally (how="horizontal").
  • When DataFrames have different columns, use how="diagonal" to include all columns with null filling for missing values.
  • For maximum performance when appending a single DataFrame with an identical schema, use .vstack().

In all cases, Polars enforces strict schema checks by default, which helps catch data inconsistencies early rather than letting them propagate silently through your pipeline.