Python Pandas: How to Append Data to an Empty Pandas DataFrame

Building a DataFrame incrementally by adding rows one at a time is a common pattern in data collection workflows. Whether you are gathering results from API calls, processing files in a loop, or running simulations, you often start with nothing and accumulate data row by row. With the deprecation of the .append() method in Pandas 2.0 and later, it is important to understand the modern alternatives that offer better performance and clearer code.

In this guide, you will learn the recommended approaches for building DataFrames incrementally, understand why the old .append() method was removed, and see the dramatic performance differences between the various techniques.

The Recommended Approach: Collect First, Convert Once

The most efficient way to build a DataFrame incrementally is to collect all your data as dictionaries in a plain Python list, then create the DataFrame in a single operation at the end:

import pandas as pd

# Collect rows as dictionaries
rows = []
for i in range(3):
    rows.append({'A': i, 'B': i * 10})

# Convert to DataFrame in one step
df = pd.DataFrame(rows)

print(df)

Output:

This approach avoids creating any DataFrame objects until the very end, which eliminates unnecessary memory allocations and copies. It is the fastest and cleanest pattern for most use cases.

Using `pd.concat()` with a List of DataFrames

When you already have multiple DataFrames that need to be combined, collect them in a list and concatenate once:

import pandas as pd

# Collect DataFrames in a list
frames = []

for i in range(3):
    new_frame = pd.DataFrame({'A': [i], 'B': [i * 10]})
    frames.append(new_frame)

# Concatenate all at once
df = pd.concat(frames, ignore_index=True)

print(df)

Output:

tip

The key to performance with pd.concat() is calling it once after collecting all the DataFrames, not calling it repeatedly inside the loop. Concatenating inside the loop recreates the entire DataFrame on every iteration, resulting in quadratic time complexity.

Using `.loc[]` for Adding Single Rows

For situations where you need to add just a few rows interactively or in a small script, you can use .loc[] with explicit index values:

import pandas as pd

df = pd.DataFrame(columns=['A', 'B'])

# Add rows with explicit indices
df.loc[0] = [1, 10]
df.loc[1] = [2, 20]
df.loc[2] = [3, 30]

print(df)

Output:

Auto-Indexing with `len(df)`

If you do not want to track index numbers manually, use len(df) to automatically assign the next available index:

import pandas as pd

df = pd.DataFrame(columns=['Name', 'Score'])

df.loc[len(df)] = ['Alice', 85]
df.loc[len(df)] = ['Bob', 92]
df.loc[len(df)] = ['Charlie', 78]

print(df)

Output:

      Name  Score
  Alice     85
    Bob     92
Charlie     78

warning

The .loc[] approach works fine for a handful of rows, but it becomes slow when used in a loop with many iterations. Each assignment modifies the DataFrame in place, which can trigger internal reallocations. For anything beyond a small number of rows, prefer the list accumulation pattern.

Why `.append()` Was Deprecated

If you are working with code written before Pandas 2.0, you may encounter the .append() method. It has been removed and will raise an error in modern versions:

import pandas as pd

df = pd.DataFrame(columns=['A', 'B'])

# This no longer works in Pandas 2.0+
try:
    df = df.append({'A': 1, 'B': 2}, ignore_index=True)
except AttributeError as e:
    print(f"Error: {e}")

Output:

Error: 'DataFrame' object has no attribute 'append'

The .append() method was deprecated for several important reasons:

It created a full copy of the entire DataFrame on every call.
When used in a loop, it resulted in O(n squared) time complexity, making it extremely slow for large datasets.
It returned a new DataFrame instead of modifying in place, which was a common source of bugs when the return value was not reassigned.

Performance Comparison

The difference between the slow and fast approaches is dramatic, especially as the number of rows grows:

import pandas as pd
import time

n_rows = 1000

# Slow: concat inside the loop (creates new DataFrame every iteration)
start = time.time()
df = pd.DataFrame(columns=['A'])
for i in range(n_rows):
    df = pd.concat([df, pd.DataFrame({'A': [i]})])
slow_time = time.time() - start

# Fast: collect as list, then convert once
start = time.time()
rows = [{'A': i} for i in range(n_rows)]
df = pd.DataFrame(rows)
fast_time = time.time() - start

print(f"Concat in loop: {slow_time:.4f}s")
print(f"List + DataFrame: {fast_time:.4f}s")
print(f"List approach is ~{slow_time / fast_time:.0f}x faster")

Output:

Concat in loop: 0.5765s
List + DataFrame: 0.0003s
List approach is ~1922x faster

The list accumulation approach changes the operation from O(n squared) to O(n), which means the performance gap widens as the dataset grows. For 10,000 rows, the slow method can take minutes while the fast method finishes in milliseconds.

Practical Example: Collecting Data from a Function

Here is a realistic example showing how to accumulate results from a data-fetching function:

import pandas as pd

def fetch_data(item_id):
    """Simulate fetching data from an API or database."""
    return {'id': item_id, 'value': item_id * 100, 'status': 'OK'}

# Collect all results in a list
results = []
for item_id in range(5):
    data = fetch_data(item_id)
    results.append(data)

# Create DataFrame once at the end
df = pd.DataFrame(results)

print(df)

Output:

   id  value status
 0      0     OK
 1    100     OK
 2    200     OK
 3    300     OK
 4    400     OK

This pattern works equally well with API responses, file processing results, simulation outputs, or any other scenario where data arrives one record at a time.

Common Mistake: Concatenating Inside the Loop

A frequent anti-pattern is placing pd.concat() inside the loop instead of outside it:

import pandas as pd

# Wrong: concatenating on every iteration
df = pd.DataFrame(columns=['A', 'B'])
for i in range(3):
    df = pd.concat([df, pd.DataFrame({'A': [i], 'B': [i * 10]})])

Output:

This works correctly but is extremely inefficient because Pandas copies the entire existing DataFrame on every iteration. The correct approach collects all data first and converts once:

import pandas as pd

# Correct: collect first, convert once
rows = []
for i in range(3):
    rows.append({'A': i, 'B': i * 10})

df = pd.DataFrame(rows)

print(df)

Output:

Both produce the same result, but the second approach is orders of magnitude faster for any non-trivial number of rows.

Quick Reference

Method	Best For	Performance
`pd.DataFrame(list_of_dicts)`	Building DataFrames in loops	Best
`pd.concat(list_of_dfs)`	Combining existing DataFrames	Good (called once)
`.loc[len(df)] = values`	Adding a few rows interactively	Acceptable for few rows
`pd.concat()` inside a loop	Never recommended	Very slow

The pattern to remember is simple: collect data in a Python list, then create the DataFrame in a single call. This approach is the fastest, most readable, and most memory-efficient way to build a DataFrame incrementally in modern Pandas.

The Recommended Approach: Collect First, Convert Once​

Using pd.concat() with a List of DataFrames​

Using .loc[] for Adding Single Rows​

Auto-Indexing with len(df)​

Why .append() Was Deprecated​

Performance Comparison​

Practical Example: Collecting Data from a Function​

Common Mistake: Concatenating Inside the Loop​

Quick Reference​

Table of Contents

The Recommended Approach: Collect First, Convert Once

Using `pd.concat()` with a List of DataFrames

Using `.loc[]` for Adding Single Rows

Auto-Indexing with `len(df)`

Why `.append()` Was Deprecated

Performance Comparison

Practical Example: Collecting Data from a Function

Common Mistake: Concatenating Inside the Loop

Quick Reference