Skip to main content

Python Pandas: How to Collapse or Combine Multiple Columns in Pandas

Merging multiple columns into a single column is a common operation in data transformation. You might need to create totals from quarterly figures, combine first and last names into a full name, or reshape wide data into a long format for visualization. Pandas provides several approaches depending on whether you are working with numeric data, text data, or restructuring the DataFrame layout entirely.

In this guide, you will learn how to sum and aggregate numeric columns, concatenate string columns while handling missing values, reshape wide DataFrames into long format, and apply custom combination logic.

Summing Numeric Columns

The simplest way to combine numeric columns is to select them and sum across each row using axis=1:

import pandas as pd

df = pd.DataFrame({
'Q1': [100, 150],
'Q2': [120, 180],
'Q3': [130, 190]
})

# Sum all three quarters into a total
df['Total'] = df[['Q1', 'Q2', 'Q3']].sum(axis=1)

# Sum only the first two quarters
df['H1_Total'] = df[['Q1', 'Q2']].sum(axis=1)

print(df)

Output:

    Q1   Q2   Q3  Total  H1_Total
0 100 120 130 350 220
1 150 180 190 520 330

The axis=1 parameter tells Pandas to sum across columns (horizontally) rather than down rows.

Collapsing Columns by Category with groupby()

When you have multiple columns that belong to the same logical category, such as quarterly sales and quarterly returns, you can collapse them into aggregated totals using a column mapping:

import pandas as pd

df = pd.DataFrame({
'ID': [1, 2],
'Q1_Sales': [100, 150],
'Q2_Sales': [120, 180],
'Q1_Returns': [10, 15],
'Q2_Returns': [8, 12]
})

# Map each column to its target category
mapping = {
'Q1_Sales': 'Total_Sales',
'Q2_Sales': 'Total_Sales',
'Q1_Returns': 'Total_Returns',
'Q2_Returns': 'Total_Returns'
}

collapsed = df.set_index('ID').groupby(mapping, axis=1).sum().reset_index()

print(collapsed)

Output:

   ID  Total_Returns  Total_Sales
0 1 18 220
1 2 27 330

The mapping dictionary tells groupby() which columns should be combined together. All columns mapped to 'Total_Sales' are summed, and all columns mapped to 'Total_Returns' are summed independently.

info

The axis=1 parameter in groupby() has been deprecated in recent Pandas versions. For newer versions, you can achieve the same result by transposing, grouping, and transposing back, or by manually selecting and summing the relevant column groups.

Concatenating String Columns

For text columns, use the + operator to join values with a separator:

import pandas as pd

df = pd.DataFrame({
'First': ['John', 'Jane'],
'Last': ['Doe', 'Smith'],
'Title': ['Mr.', 'Dr.']
})

# Simple two-column concatenation
df['Full_Name'] = df['First'] + ' ' + df['Last']

# Combining three columns
df['Formal'] = df['Title'] + ' ' + df['First'] + ' ' + df['Last']

print(df)

Output:

  First   Last Title   Full_Name          Formal
0 John Doe Mr. John Doe Mr. John Doe
1 Jane Smith Dr. Jane Smith Dr. Jane Smith

Handling Missing Values in String Concatenation

When any of the columns contain NaN, the + operator propagates the missing value and produces NaN in the result. You need to handle this explicitly:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'First': ['John', 'Jane'],
'Middle': [np.nan, 'Marie'],
'Last': ['Doe', 'Smith']
})

# Without handling NaN: the result for John would be NaN
print("Without fillna:")
print(df['First'] + ' ' + df['Middle'] + ' ' + df['Last'])
print()

# With fillna: replace NaN with empty string before concatenating
df['Full'] = (df['First'] + ' ' +
df['Middle'].fillna('') + ' ' +
df['Last'])

print("With fillna:")
print(df['Full'])

Output:

Without fillna:
0 NaN
1 Jane Marie Smith
dtype: object

With fillna:
0 John Doe
1 Jane Marie Smith
Name: Full, dtype: object

For a cleaner approach that avoids double spaces when middle names are missing, use .agg() with a join function:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'First': ['John', 'Jane'],
'Middle': [np.nan, 'Marie'],
'Last': ['Doe', 'Smith']
})

# Join non-empty values with a space
df['Full'] = df[['First', 'Middle', 'Last']].fillna('').agg(
lambda row: ' '.join(part for part in row if part), axis=1
)

print(df)

Output:

  First Middle   Last              Full
0 John NaN Doe John Doe
1 Jane Marie Smith Jane Marie Smith

Reshaping Wide to Long Format with melt()

Sometimes "combining columns" really means restructuring the data so that column headers become values in a single column. The melt() function converts wide format to long format, which is often required for visualization libraries like Seaborn and Plotly:

import pandas as pd

df = pd.DataFrame({
'Year': [2023, 2024],
'Q1': [100, 150],
'Q2': [120, 180],
'Q3': [130, 190],
'Q4': [140, 200]
})

melted = df.melt(
id_vars=['Year'],
var_name='Quarter',
value_name='Sales'
)

print(melted)

Output:

   Year Quarter  Sales
0 2023 Q1 100
1 2024 Q1 150
2 2023 Q2 120
3 2024 Q2 180
4 2023 Q3 130
5 2024 Q3 190
6 2023 Q4 140
7 2024 Q4 200

The four quarter columns have been collapsed into two columns: Quarter (holding the former column names) and Sales (holding the values). The Year column is preserved as an identifier.

Aggregating Across Columns with .agg()

When you need different aggregation functions beyond a simple sum, select the columns and apply the desired function:

import pandas as pd

df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})

cols = ['A', 'B', 'C']

df['Sum'] = df[cols].sum(axis=1)
df['Mean'] = df[cols].mean(axis=1)
df['Range'] = df[cols].max(axis=1) - df[cols].min(axis=1)

print(df)

Output:

   A  B  C  Sum  Mean  Range
0 1 4 7 12 4.0 6
1 2 5 8 15 5.0 6
2 3 6 9 18 6.0 6

Using .apply() for Complex Combination Logic

When the combination logic involves formatting or conditional rules that cannot be expressed with simple operators, use .apply() with axis=1:

import pandas as pd

df = pd.DataFrame({
'City': ['New York', 'Los Angeles'],
'State': ['NY', 'CA'],
'Zip': ['10001', '90001']
})

df['Location'] = df.apply(
lambda row: f"{row['City']}, {row['State']} {row['Zip']}",
axis=1
)

print(df)

Output:

          City State    Zip               Location
0 New York NY 10001 New York, NY 10001
1 Los Angeles CA 90001 Los Angeles, CA 90001
tip

Reserve .apply() for cases where the combination logic is too complex for vectorized operations. For simple string concatenation or arithmetic, the + operator and .sum() are significantly faster on large datasets.

Quick Reference

GoalMethodExample
Sum columns.sum(axis=1)df[cols].sum(axis=1)
Collapse by categorygroupby(mapping, axis=1)Map subcategories to totals
Join strings+ operatordf['A'] + ' ' + df['B']
Join strings (with NaN).fillna('') + ...Handle missing before joining
Reshape wide to long.melt()Column headers become row values
Custom aggregation.mean(), .max(), .min()df[cols].mean(axis=1)
Complex combination logic.apply(func, axis=1)Formatted strings, conditionals
  • Use df[cols].sum(axis=1) for straightforward numeric totals.
  • Use string concatenation with + for merging text columns, adding .fillna('') when missing values are possible.
  • Use melt() to reshape wide data into long format for visualization.

For complex multi-column logic that does not fit neatly into vectorized operations, use .apply() with axis=1.