Skip to main content

Python Pandas: How to Calculate Sum and Max of Excel Columns with Pandas

Pandas is the primary tool for Excel automation in Python. While many beginners attempt to process Excel sheets using row-by-row loops, the professional approach is to use vectorized operations. Vectorization allows Pandas to perform calculations across entire columns simultaneously, producing code that is dramatically faster and much easier to read.

In this guide, you will learn how to read an Excel file into a DataFrame, create new columns based on the sum and maximum values of existing columns, handle missing data, and understand why vectorized operations should always be preferred over loops.

Setup and Prerequisites

To read and write modern .xlsx files, you need both Pandas and the openpyxl engine:

pip install pandas openpyxl

Reading an Excel File

Start by loading the Excel file into a Pandas DataFrame and inspecting its structure:

import pandas as pd

df = pd.read_excel("sales_data.xlsx")

print(df.head())
print(df.columns.tolist())

Example output:

   Product  Selling Price  Cost Price
0 Widget 25.00 15.00
1 Gadget 40.00 22.00
2 Gizmo 18.50 12.75
3 Doohickey 9.99 7.50
4 Thingamajig 55.00 30.00
['Product', 'Selling Price', 'Cost Price']
tip

Column names in your code must match the Excel headers exactly, including spaces and capitalization. If you get a KeyError, use print(df.columns.tolist()) to see the actual column names in the file.

Creating a Sum Column with Vectorized Addition

Instead of looping through each row, treat columns as mathematical variables. Adding two columns together creates a new column where every row is the sum of the corresponding values:

import pandas as pd

df = pd.read_excel("sales_data.xlsx")

# Add two columns together - every row is computed simultaneously
df["Total"] = df["Selling Price"] + df["Cost Price"]

print(df.head())

Output:

   Product  Selling Price  Cost Price  Total
0 Widget 25.00 15.00 40.00
1 Gadget 40.00 22.00 62.00
2 Gizmo 18.50 12.75 31.25
3 Doohickey 9.99 7.50 17.49
4 Thingamajig 55.00 30.00 85.00

This single line replaces what would otherwise be a multi-line loop, and it executes at C-level speed internally.

You can also sum across more than two columns:

# Sum multiple columns at once
df["Grand Total"] = df[["Selling Price", "Cost Price", "Tax"]].sum(axis=1)

The axis=1 parameter tells Pandas to sum across columns (horizontally) for each row, rather than down a single column.

Creating a Max Column

To find the higher value between two or more columns for each row, use .max(axis=1) on a subset of columns:

import pandas as pd

df = pd.read_excel("sales_data.xlsx")

# Find the higher value between Selling Price and Cost Price for each row
df["Higher Value"] = df[["Selling Price", "Cost Price"]].max(axis=1)

print(df.head())

Output:

   Product  Selling Price  Cost Price  Higher Value
0 Widget 25.00 15.00 25.00
1 Gadget 40.00 22.00 40.00
2 Gizmo 18.50 12.75 18.50
3 Doohickey 9.99 7.50 9.99
4 Thingamajig 55.00 30.00 55.00

The same pattern works for .min(axis=1), .mean(axis=1), and other aggregation functions.

Handling Missing Data (NaN)

If an Excel cell is empty, Pandas loads it as NaN (Not a Number). Any arithmetic operation involving NaN produces NaN, which can silently break your calculations:

import pandas as pd
import numpy as np

# Simulating data with a missing value
df = pd.DataFrame({
"Selling Price": [25.00, 40.00, np.nan],
"Cost Price": [15.00, 22.00, 12.75]
})

# NaN propagates through addition
df["Total"] = df["Selling Price"] + df["Cost Price"]
print(df)

Output:

   Selling Price  Cost Price  Total
0 25.00 15.00 40.00
1 40.00 22.00 62.00
2 NaN 12.75 NaN

The third row's total is NaN because NaN + 12.75 equals NaN. To prevent this, fill missing values before calculating:

# Fill empty cells with 0 before calculating
df["Selling Price"] = df["Selling Price"].fillna(0)
df["Cost Price"] = df["Cost Price"].fillna(0)

df["Total"] = df["Selling Price"] + df["Cost Price"]
print(df)

Output:

   Selling Price  Cost Price  Total
0 25.00 15.00 40.00
1 40.00 22.00 62.00
2 0.00 12.75 12.75

Alternatively, you can use dropna() to remove rows with missing values entirely if they should not be included in the analysis.

Saving Results Back to Excel

After computing new columns, save the updated DataFrame to a new Excel file:

df.to_excel("processed_sales.xlsx", index=False)
print("File saved successfully")

The index=False parameter prevents Pandas from writing the DataFrame index as an extra column in the Excel file.

Why You Should Avoid Loops

You might find tutorials using for index, row in df.iterrows() to process rows one at a time. While this works for tiny files, it is highly discouraged in professional environments.

Here is the loop-based approach compared to the vectorized approach:

import pandas as pd

df = pd.read_excel("sales_data.xlsx")

# Slow: loop-based approach (avoid this)
for index, row in df.iterrows():
df.at[index, "Total"] = row["Selling Price"] + row["Cost Price"]

# Fast: vectorized approach (use this)
df["Total"] = df["Selling Price"] + df["Cost Price"]

Both produce the same result, but the performance difference is substantial:

FeatureVectorized (Recommended)Iterative (Loops)
SpeedExtremely fast (C-level optimization)Slow (Python-level looping)
Readabilitydf['A'] + df['B'] (one line)5 to 10 lines of loop logic
ScalabilityHandles millions of rows easilySlows dramatically on large files
Memory Usage with Large Files

When reading large Excel files (100 MB+), consider using the usecols parameter in read_excel() to load only the columns you need:

df = pd.read_excel("huge_file.xlsx", usecols=["Selling Price", "Cost Price"])

This reduces memory usage significantly by skipping columns that are not relevant to your calculation.

Complete Example

Here is a full workflow that reads an Excel file, computes sum and max columns, handles missing data, and saves the result:

import pandas as pd

# Read the source file
df = pd.read_excel("sales_data.xlsx")

# Handle missing values
df["Selling Price"] = df["Selling Price"].fillna(0)
df["Cost Price"] = df["Cost Price"].fillna(0)

# Compute new columns
df["Total"] = df["Selling Price"] + df["Cost Price"]
df["Higher Value"] = df[["Selling Price", "Cost Price"]].max(axis=1)
df["Profit"] = df["Selling Price"] - df["Cost Price"]

# Save results
df.to_excel("processed_sales.xlsx", index=False)

print(f"Processed {len(df)} rows")
print(df.head())

Example output:

Processed 5 rows
Product Selling Price Cost Price Total Higher Value Profit
0 Widget 25.00 15.00 40.00 25.00 10.00
1 Gadget 40.00 22.00 62.00 40.00 18.00
2 Gizmo 18.50 12.75 31.25 18.50 5.75
3 Doohickey 9.99 7.50 17.49 9.99 2.49
4 Thingamajig 55.00 30.00 85.00 55.00 25.00

Conclusion

By using vectorized Pandas operations instead of row-by-row iteration, you can transform complex Excel spreadsheets into clean, actionable data with just a few lines of code.

  • Use df['A'] + df['B'] for column-wise addition,
  • Use df[columns].max(axis=1) for row-wise maximum
  • Always handle NaN values with .fillna() before performing calculations.

This approach is not only faster but significantly more robust for real-world business data at any scale.