Python Pandas: How to Compare Two DataFrames with Pandas `compare()`

When working with data in pandas, you'll often need to identify differences between two DataFrames, whether you're tracking changes in datasets over time, validating data transformations, auditing updates, or debugging data pipelines. The DataFrame.compare() method, introduced in pandas 1.1.0, provides a clean and structured way to highlight exactly what changed between two DataFrames.

In this guide, you'll learn how to use compare() effectively, understand each of its parameters, and see practical examples that cover common comparison scenarios.

Prerequisites

The compare() method requires pandas 1.1.0 or later. Check your version and upgrade if needed:

import pandas as pd
print(pd.__version__)

If your version is below 1.1.0, upgrade with:

pip install pandas --upgrade

Understanding `DataFrame.compare()`

The compare() method compares two DataFrames element by element and returns a new DataFrame highlighting only the differences.

Syntax:

DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)

Parameter	Type	Default	Description
`other`	DataFrame	-	The DataFrame to compare against
`align_axis`	`0` or `1`	`1`	`1`: differences shown as columns; `0`: differences shown as rows
`keep_shape`	bool	`False`	If `True`, keep all rows and columns (not just those with differences)
`keep_equal`	bool	`False`	If `True`, show equal values instead of replacing them with `NaN`

Setting Up the DataFrames

Let's create two DataFrames that share the same structure but have some different values:

import pandas as pd

# Original DataFrame
df_original = pd.DataFrame({
    'Product': ['Pens', 'Scales', 'Pencils', 'Geometry Box', 'Crayon Set'],
    'Price': [100, 50, 25, 100, 65],
    'Quantity': [10, 5, 5, 2, 1]
})

# Updated DataFrame with some changes
df_updated = df_original.copy()
df_updated.loc[0, 'Price'] = 150       # Changed: 100 → 150
df_updated.loc[1, 'Price'] = 70        # Changed: 50 → 70
df_updated.loc[2, 'Price'] = 30        # Changed: 25 → 30
df_updated.loc[0, 'Quantity'] = 15     # Changed: 10 → 15
df_updated.loc[1, 'Quantity'] = 7      # Changed: 5 → 7
df_updated.loc[2, 'Quantity'] = 6      # Changed: 5 → 6

print("Original DataFrame:")
print(df_original)
print("\nUpdated DataFrame:")
print(df_updated)

Output:

Original DataFrame:
        Product  Price  Quantity
        Pens    100        10
      Scales     50         5
     Pencils     25         5
Geometry Box    100         2
  Crayon Set     65         1

Updated DataFrame:
        Product  Price  Quantity
        Pens    150        15
      Scales     70         7
     Pencils     30         6
Geometry Box    100         2
  Crayon Set     65         1

Basic Comparison (Default Settings)

With default settings, compare() shows only the rows and columns that have differences, with self and other labels indicating the old and new values:

import pandas as pd

# Original DataFrame
df_original = pd.DataFrame({
    'Product': ['Pens', 'Scales', 'Pencils', 'Geometry Box', 'Crayon Set'],
    'Price': [100, 50, 25, 100, 65],
    'Quantity': [10, 5, 5, 2, 1]
})

# Updated DataFrame with some changes
df_updated = df_original.copy()
df_updated.loc[0, 'Price'] = 150       # Changed: 100 → 150
df_updated.loc[1, 'Price'] = 70        # Changed: 50 → 70
df_updated.loc[2, 'Price'] = 30        # Changed: 25 → 30
df_updated.loc[0, 'Quantity'] = 15     # Changed: 10 → 15
df_updated.loc[1, 'Quantity'] = 7      # Changed: 5 → 7
df_updated.loc[2, 'Quantity'] = 6      # Changed: 5 → 6

diff = df_original.compare(df_updated)
print(diff)

Output:

   Price        Quantity      
    self  other     self other
0  100.0  150.0     10.0  15.0
1   50.0   70.0      5.0   7.0
2   25.0   30.0      5.0   6.0

Only rows 0, 1, and 2 appear because rows 3 and 4 are identical. Only Price and Quantity columns appear because Product values didn't change. The self column shows the original value and other shows the updated value.

Aligning Differences on Rows (`align_axis=0`)

Setting align_axis=0 displays differences vertically with self and other as row labels:

import pandas as pd

# Original DataFrame
df_original = pd.DataFrame({
    'Product': ['Pens', 'Scales', 'Pencils', 'Geometry Box', 'Crayon Set'],
    'Price': [100, 50, 25, 100, 65],
    'Quantity': [10, 5, 5, 2, 1]
})

# Updated DataFrame with some changes
df_updated = df_original.copy()
df_updated.loc[0, 'Price'] = 150       # Changed: 100 → 150
df_updated.loc[1, 'Price'] = 70        # Changed: 50 → 70
df_updated.loc[2, 'Price'] = 30        # Changed: 25 → 30
df_updated.loc[0, 'Quantity'] = 15     # Changed: 10 → 15
df_updated.loc[1, 'Quantity'] = 7      # Changed: 5 → 7
df_updated.loc[2, 'Quantity'] = 6      # Changed: 5 → 6

diff = df_original.compare(df_updated, align_axis=0)
print(diff)

Output:

         Price  Quantity
0 self   100.0      10.0
  other  150.0      15.0
1 self    50.0       5.0
  other   70.0       7.0
2 self    25.0       5.0
  other   30.0       6.0

This format can be easier to read when comparing many columns, as each row pair shows the before and after values side by side.

Keeping the Full Shape (`keep_shape=True`)

By default, compare() only shows rows and columns with differences. Setting keep_shape=True retains the entire DataFrame structure, filling unchanged values with NaN:

import pandas as pd

# Original DataFrame
df_original = pd.DataFrame({
    'Product': ['Pens', 'Scales', 'Pencils', 'Geometry Box', 'Crayon Set'],
    'Price': [100, 50, 25, 100, 65],
    'Quantity': [10, 5, 5, 2, 1]
})

# Updated DataFrame with some changes
df_updated = df_original.copy()
df_updated.loc[0, 'Price'] = 150       # Changed: 100 → 150
df_updated.loc[1, 'Price'] = 70        # Changed: 50 → 70
df_updated.loc[2, 'Price'] = 30        # Changed: 25 → 30
df_updated.loc[0, 'Quantity'] = 15     # Changed: 10 → 15
df_updated.loc[1, 'Quantity'] = 7      # Changed: 5 → 7
df_updated.loc[2, 'Quantity'] = 6      # Changed: 5 → 6

diff = df_original.compare(df_updated, keep_shape=True)
print(diff)

Output:

  Product        Price        Quantity      
     self other   self  other     self other
   NaN   NaN  100.0  150.0     10.0  15.0
   NaN   NaN   50.0   70.0      5.0   7.0
   NaN   NaN   25.0   30.0      5.0   6.0
   NaN   NaN    NaN    NaN      NaN   NaN
   NaN   NaN    NaN    NaN      NaN   NaN

All rows and columns are present. Unchanged values (like Product and rows 3–4) appear as NaN.

Showing Equal Values (`keep_equal=True`)

Setting keep_equal=True displays the actual values for unchanged cells instead of NaN:

diff = df_original.compare(df_updated, keep_shape=True, keep_equal=True)
print(diff)

Output:

        Product               Price        Quantity
           self        other   self other     self other
        Pens         Pens    100   150       10    15
      Scales       Scales     50    70        5     7
     Pencils      Pencils     25    30        5     6
Geometry Box  Geometry Box   100   100        2     2
  Crayon Set    Crayon Set     65    65        1     1

This gives you a complete side-by-side view of both DataFrames, making it easy to see what changed and what stayed the same.

Comparing Only Differences Without Shape

Using just keep_equal=True (without keep_shape) shows equal values for changed rows but still hides entirely unchanged rows:

import pandas as pd

# Original DataFrame
df_original = pd.DataFrame({
    'Product': ['Pens', 'Scales', 'Pencils', 'Geometry Box', 'Crayon Set'],
    'Price': [100, 50, 25, 100, 65],
    'Quantity': [10, 5, 5, 2, 1]
})

# Updated DataFrame with some changes
df_updated = df_original.copy()
df_updated.loc[0, 'Price'] = 150       # Changed: 100 → 150
df_updated.loc[1, 'Price'] = 70        # Changed: 50 → 70
df_updated.loc[2, 'Price'] = 30        # Changed: 25 → 30
df_updated.loc[0, 'Quantity'] = 15     # Changed: 10 → 15
df_updated.loc[1, 'Quantity'] = 7      # Changed: 5 → 7
df_updated.loc[2, 'Quantity'] = 6      # Changed: 5 → 6

diff = df_original.compare(df_updated, keep_equal=True)
print(diff)

Output:

  Price       Quantity      
   self other     self other
0   100   150       10    15
1    50    70        5     7
2    25    30        5     6

Parameter Combination Summary

`keep_shape`	`keep_equal`	Result
`False`	`False`	Only changed rows/columns, differences only (default)
`False`	`True`	Only changed rows/columns, all values shown
`True`	`False`	All rows/columns, unchanged values as `NaN`
`True`	`True`	All rows/columns, all values shown

Practical Example: Tracking Price Changes

Here's a real-world example that uses compare() to generate a change report:

import pandas as pd

# Weekly price data
week1 = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
    'Price': [999, 699, 329, 249, 79],
    'Stock': [50, 120, 85, 60, 200]
})

week2 = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
    'Price': [949, 699, 349, 249, 69],
    'Stock': [42, 95, 85, 55, 180]
})

# Find differences
changes = week1.compare(week2)

print("Changes between Week 1 and Week 2:")
print(changes)

# Count number of changes per column
print(f"\nTotal changes per column:")
for col in changes.columns.get_level_values(0).unique():
    count = changes[col].dropna().shape[0]
    print(f"  {col}: {count} changes")

Output:

Changes between Week 1 and Week 2:
   Price         Stock       
    self  other   self  other
0  999.0  949.0   50.0   42.0
1    NaN    NaN  120.0   95.0
2  329.0  349.0    NaN    NaN
3    NaN    NaN   60.0   55.0
4   79.0   69.0  200.0  180.0

Total changes per column:
  Price: 3 changes
  Stock: 4 changes

Common Errors and How to Avoid Them

Error: Mismatched DataFrame Shapes

The compare() method requires both DataFrames to have identical indices and columns. If they don't match, you'll get a ValueError:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3]})
df2 = pd.DataFrame({'A': [1, 2, 3, 4]})

df1.compare(df2)

Output:

ValueError: Can only compare identically-labeled DataFrame objects

Solution: Ensure both DataFrames have the same shape and labels before comparing. If needed, align them first:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3]})
df2 = pd.DataFrame({'A': [1, 2, 3, 4]})

# Reset index and trim to matching rows
common_index = df1.index.intersection(df2.index)
df1_aligned = df1.loc[common_index]
df2_aligned = df2.loc[common_index]

diff = df1_aligned.compare(df2_aligned)

Error: Different Column Names

The column names must also match exactly:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
df2 = pd.DataFrame({'Name': ['Alice'], 'age': [31]})  # lowercase 'age'

df1.compare(df2)  # ValueError

Solution: Standardize column names before comparing:

df2.columns = df1.columns  # Or use df.rename()

Alternative: Using `!=` for Quick Difference Detection

For a simpler boolean view of where differences exist, use the != operator:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 9, 3], 'B': [4, 5, 8]})

print("Where values differ:")
print(df1 != df2)

Output:

Where values differ:
       A      B
0  False  False
1   True  False
2  False   True

This tells you where differences exist but not what the differences are. That's where compare() excels.

Conclusion

The DataFrame.compare() method is a powerful tool for identifying and visualizing differences between two DataFrames:

Use the default settings for a concise view of only the changed values.
Set align_axis=0 to display differences vertically for better readability with many columns.
Set keep_shape=True to retain the full DataFrame structure and see which rows were unchanged.
Set keep_equal=True to display actual values instead of NaN for unchanged cells.
Always ensure both DataFrames have identical indices and columns before comparing to avoid ValueError.

For quick boolean detection of where differences exist, the != operator is a simpler alternative. For detailed change tracking with old and new values, compare() is the ideal choice.

Prerequisites​

Understanding DataFrame.compare()​

Setting Up the DataFrames​

Basic Comparison (Default Settings)​

Aligning Differences on Rows (align_axis=0)​

Keeping the Full Shape (keep_shape=True)​

Showing Equal Values (keep_equal=True)​

Comparing Only Differences Without Shape​

Practical Example: Tracking Price Changes​

Common Errors and How to Avoid Them​

Alternative: Using != for Quick Difference Detection​

Conclusion​

Table of Contents

Prerequisites

Understanding `DataFrame.compare()`

Setting Up the DataFrames

Basic Comparison (Default Settings)

Aligning Differences on Rows (`align_axis=0`)

Keeping the Full Shape (`keep_shape=True`)

Showing Equal Values (`keep_equal=True`)

Comparing Only Differences Without Shape

Practical Example: Tracking Price Changes

Common Errors and How to Avoid Them

Alternative: Using `!=` for Quick Difference Detection

Conclusion