Skip to main content

Python Pandas: How to Append Data to an Empty Pandas DataFrame

Building a DataFrame incrementally by adding rows one at a time is a common pattern in data collection workflows. Whether you are gathering results from API calls, processing files in a loop, or running simulations, you often start with nothing and accumulate data row by row. With the deprecation of the .append() method in Pandas 2.0 and later, it is important to understand the modern alternatives that offer better performance and clearer code.

In this guide, you will learn the recommended approaches for building DataFrames incrementally, understand why the old .append() method was removed, and see the dramatic performance differences between the various techniques.

The most efficient way to build a DataFrame incrementally is to collect all your data as dictionaries in a plain Python list, then create the DataFrame in a single operation at the end:

import pandas as pd

# Collect rows as dictionaries
rows = []
for i in range(3):
rows.append({'A': i, 'B': i * 10})

# Convert to DataFrame in one step
df = pd.DataFrame(rows)

print(df)

Output:

   A   B
0 0 0
1 1 10
2 2 20

This approach avoids creating any DataFrame objects until the very end, which eliminates unnecessary memory allocations and copies. It is the fastest and cleanest pattern for most use cases.

Using pd.concat() with a List of DataFrames

When you already have multiple DataFrames that need to be combined, collect them in a list and concatenate once:

import pandas as pd

# Collect DataFrames in a list
frames = []

for i in range(3):
new_frame = pd.DataFrame({'A': [i], 'B': [i * 10]})
frames.append(new_frame)

# Concatenate all at once
df = pd.concat(frames, ignore_index=True)

print(df)

Output:

   A   B
0 0 0
1 1 10
2 2 20
tip

The key to performance with pd.concat() is calling it once after collecting all the DataFrames, not calling it repeatedly inside the loop. Concatenating inside the loop recreates the entire DataFrame on every iteration, resulting in quadratic time complexity.

Using .loc[] for Adding Single Rows

For situations where you need to add just a few rows interactively or in a small script, you can use .loc[] with explicit index values:

import pandas as pd

df = pd.DataFrame(columns=['A', 'B'])

# Add rows with explicit indices
df.loc[0] = [1, 10]
df.loc[1] = [2, 20]
df.loc[2] = [3, 30]

print(df)

Output:

   A   B
0 1 10
1 2 20
2 3 30

Auto-Indexing with len(df)

If you do not want to track index numbers manually, use len(df) to automatically assign the next available index:

import pandas as pd

df = pd.DataFrame(columns=['Name', 'Score'])

df.loc[len(df)] = ['Alice', 85]
df.loc[len(df)] = ['Bob', 92]
df.loc[len(df)] = ['Charlie', 78]

print(df)

Output:

      Name  Score
0 Alice 85
1 Bob 92
2 Charlie 78
warning

The .loc[] approach works fine for a handful of rows, but it becomes slow when used in a loop with many iterations. Each assignment modifies the DataFrame in place, which can trigger internal reallocations. For anything beyond a small number of rows, prefer the list accumulation pattern.

Why .append() Was Deprecated

If you are working with code written before Pandas 2.0, you may encounter the .append() method. It has been removed and will raise an error in modern versions:

import pandas as pd

df = pd.DataFrame(columns=['A', 'B'])

# This no longer works in Pandas 2.0+
try:
df = df.append({'A': 1, 'B': 2}, ignore_index=True)
except AttributeError as e:
print(f"Error: {e}")

Output:

Error: 'DataFrame' object has no attribute 'append'

The .append() method was deprecated for several important reasons:

  • It created a full copy of the entire DataFrame on every call.
  • When used in a loop, it resulted in O(n squared) time complexity, making it extremely slow for large datasets.
  • It returned a new DataFrame instead of modifying in place, which was a common source of bugs when the return value was not reassigned.

Performance Comparison

The difference between the slow and fast approaches is dramatic, especially as the number of rows grows:

import pandas as pd
import time

n_rows = 1000

# Slow: concat inside the loop (creates new DataFrame every iteration)
start = time.time()
df = pd.DataFrame(columns=['A'])
for i in range(n_rows):
df = pd.concat([df, pd.DataFrame({'A': [i]})])
slow_time = time.time() - start

# Fast: collect as list, then convert once
start = time.time()
rows = [{'A': i} for i in range(n_rows)]
df = pd.DataFrame(rows)
fast_time = time.time() - start

print(f"Concat in loop: {slow_time:.4f}s")
print(f"List + DataFrame: {fast_time:.4f}s")
print(f"List approach is ~{slow_time / fast_time:.0f}x faster")

Output:

Concat in loop: 0.5765s
List + DataFrame: 0.0003s
List approach is ~1922x faster

The list accumulation approach changes the operation from O(n squared) to O(n), which means the performance gap widens as the dataset grows. For 10,000 rows, the slow method can take minutes while the fast method finishes in milliseconds.

Practical Example: Collecting Data from a Function

Here is a realistic example showing how to accumulate results from a data-fetching function:

import pandas as pd

def fetch_data(item_id):
"""Simulate fetching data from an API or database."""
return {'id': item_id, 'value': item_id * 100, 'status': 'OK'}

# Collect all results in a list
results = []
for item_id in range(5):
data = fetch_data(item_id)
results.append(data)

# Create DataFrame once at the end
df = pd.DataFrame(results)

print(df)

Output:

   id  value status
0 0 0 OK
1 1 100 OK
2 2 200 OK
3 3 300 OK
4 4 400 OK

This pattern works equally well with API responses, file processing results, simulation outputs, or any other scenario where data arrives one record at a time.

Common Mistake: Concatenating Inside the Loop

A frequent anti-pattern is placing pd.concat() inside the loop instead of outside it:

import pandas as pd

# Wrong: concatenating on every iteration
df = pd.DataFrame(columns=['A', 'B'])
for i in range(3):
df = pd.concat([df, pd.DataFrame({'A': [i], 'B': [i * 10]})])

Output:

   A   B
0 0 0
0 1 10
0 2 20

This works correctly but is extremely inefficient because Pandas copies the entire existing DataFrame on every iteration. The correct approach collects all data first and converts once:

import pandas as pd

# Correct: collect first, convert once
rows = []
for i in range(3):
rows.append({'A': i, 'B': i * 10})

df = pd.DataFrame(rows)

print(df)

Output:

   A   B
0 0 0
1 1 10
2 2 20

Both produce the same result, but the second approach is orders of magnitude faster for any non-trivial number of rows.

Quick Reference

MethodBest ForPerformance
pd.DataFrame(list_of_dicts)Building DataFrames in loopsBest
pd.concat(list_of_dfs)Combining existing DataFramesGood (called once)
.loc[len(df)] = valuesAdding a few rows interactivelyAcceptable for few rows
pd.concat() inside a loopNever recommendedVery slow

The pattern to remember is simple: collect data in a Python list, then create the DataFrame in a single call. This approach is the fastest, most readable, and most memory-efficient way to build a DataFrame incrementally in modern Pandas.