Python Pandas: How to Collapse or Combine Multiple Columns in Pandas
Merging multiple columns into a single column is a common operation in data transformation. You might need to create totals from quarterly figures, combine first and last names into a full name, or reshape wide data into a long format for visualization. Pandas provides several approaches depending on whether you are working with numeric data, text data, or restructuring the DataFrame layout entirely.
In this guide, you will learn how to sum and aggregate numeric columns, concatenate string columns while handling missing values, reshape wide DataFrames into long format, and apply custom combination logic.
Summing Numeric Columns
The simplest way to combine numeric columns is to select them and sum across each row using axis=1:
import pandas as pd
df = pd.DataFrame({
'Q1': [100, 150],
'Q2': [120, 180],
'Q3': [130, 190]
})
# Sum all three quarters into a total
df['Total'] = df[['Q1', 'Q2', 'Q3']].sum(axis=1)
# Sum only the first two quarters
df['H1_Total'] = df[['Q1', 'Q2']].sum(axis=1)
print(df)
Output:
Q1 Q2 Q3 Total H1_Total
0 100 120 130 350 220
1 150 180 190 520 330
The axis=1 parameter tells Pandas to sum across columns (horizontally) rather than down rows.
Collapsing Columns by Category with groupby()
When you have multiple columns that belong to the same logical category, such as quarterly sales and quarterly returns, you can collapse them into aggregated totals using a column mapping:
import pandas as pd
df = pd.DataFrame({
'ID': [1, 2],
'Q1_Sales': [100, 150],
'Q2_Sales': [120, 180],
'Q1_Returns': [10, 15],
'Q2_Returns': [8, 12]
})
# Map each column to its target category
mapping = {
'Q1_Sales': 'Total_Sales',
'Q2_Sales': 'Total_Sales',
'Q1_Returns': 'Total_Returns',
'Q2_Returns': 'Total_Returns'
}
collapsed = df.set_index('ID').groupby(mapping, axis=1).sum().reset_index()
print(collapsed)
Output:
ID Total_Returns Total_Sales
0 1 18 220
1 2 27 330
The mapping dictionary tells groupby() which columns should be combined together. All columns mapped to 'Total_Sales' are summed, and all columns mapped to 'Total_Returns' are summed independently.
The axis=1 parameter in groupby() has been deprecated in recent Pandas versions. For newer versions, you can achieve the same result by transposing, grouping, and transposing back, or by manually selecting and summing the relevant column groups.
Concatenating String Columns
For text columns, use the + operator to join values with a separator:
import pandas as pd
df = pd.DataFrame({
'First': ['John', 'Jane'],
'Last': ['Doe', 'Smith'],
'Title': ['Mr.', 'Dr.']
})
# Simple two-column concatenation
df['Full_Name'] = df['First'] + ' ' + df['Last']
# Combining three columns
df['Formal'] = df['Title'] + ' ' + df['First'] + ' ' + df['Last']
print(df)
Output:
First Last Title Full_Name Formal
0 John Doe Mr. John Doe Mr. John Doe
1 Jane Smith Dr. Jane Smith Dr. Jane Smith
Handling Missing Values in String Concatenation
When any of the columns contain NaN, the + operator propagates the missing value and produces NaN in the result. You need to handle this explicitly:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'First': ['John', 'Jane'],
'Middle': [np.nan, 'Marie'],
'Last': ['Doe', 'Smith']
})
# Without handling NaN: the result for John would be NaN
print("Without fillna:")
print(df['First'] + ' ' + df['Middle'] + ' ' + df['Last'])
print()
# With fillna: replace NaN with empty string before concatenating
df['Full'] = (df['First'] + ' ' +
df['Middle'].fillna('') + ' ' +
df['Last'])
print("With fillna:")
print(df['Full'])
Output:
Without fillna:
0 NaN
1 Jane Marie Smith
dtype: object
With fillna:
0 John Doe
1 Jane Marie Smith
Name: Full, dtype: object
For a cleaner approach that avoids double spaces when middle names are missing, use .agg() with a join function:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'First': ['John', 'Jane'],
'Middle': [np.nan, 'Marie'],
'Last': ['Doe', 'Smith']
})
# Join non-empty values with a space
df['Full'] = df[['First', 'Middle', 'Last']].fillna('').agg(
lambda row: ' '.join(part for part in row if part), axis=1
)
print(df)
Output:
First Middle Last Full
0 John NaN Doe John Doe
1 Jane Marie Smith Jane Marie Smith
Reshaping Wide to Long Format with melt()
Sometimes "combining columns" really means restructuring the data so that column headers become values in a single column. The melt() function converts wide format to long format, which is often required for visualization libraries like Seaborn and Plotly:
import pandas as pd
df = pd.DataFrame({
'Year': [2023, 2024],
'Q1': [100, 150],
'Q2': [120, 180],
'Q3': [130, 190],
'Q4': [140, 200]
})
melted = df.melt(
id_vars=['Year'],
var_name='Quarter',
value_name='Sales'
)
print(melted)
Output:
Year Quarter Sales
0 2023 Q1 100
1 2024 Q1 150
2 2023 Q2 120
3 2024 Q2 180
4 2023 Q3 130
5 2024 Q3 190
6 2023 Q4 140
7 2024 Q4 200
The four quarter columns have been collapsed into two columns: Quarter (holding the former column names) and Sales (holding the values). The Year column is preserved as an identifier.
Aggregating Across Columns with .agg()
When you need different aggregation functions beyond a simple sum, select the columns and apply the desired function:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
cols = ['A', 'B', 'C']
df['Sum'] = df[cols].sum(axis=1)
df['Mean'] = df[cols].mean(axis=1)
df['Range'] = df[cols].max(axis=1) - df[cols].min(axis=1)
print(df)
Output:
A B C Sum Mean Range
0 1 4 7 12 4.0 6
1 2 5 8 15 5.0 6
2 3 6 9 18 6.0 6
Using .apply() for Complex Combination Logic
When the combination logic involves formatting or conditional rules that cannot be expressed with simple operators, use .apply() with axis=1:
import pandas as pd
df = pd.DataFrame({
'City': ['New York', 'Los Angeles'],
'State': ['NY', 'CA'],
'Zip': ['10001', '90001']
})
df['Location'] = df.apply(
lambda row: f"{row['City']}, {row['State']} {row['Zip']}",
axis=1
)
print(df)
Output:
City State Zip Location
0 New York NY 10001 New York, NY 10001
1 Los Angeles CA 90001 Los Angeles, CA 90001
Reserve .apply() for cases where the combination logic is too complex for vectorized operations. For simple string concatenation or arithmetic, the + operator and .sum() are significantly faster on large datasets.
Quick Reference
| Goal | Method | Example |
|---|---|---|
| Sum columns | .sum(axis=1) | df[cols].sum(axis=1) |
| Collapse by category | groupby(mapping, axis=1) | Map subcategories to totals |
| Join strings | + operator | df['A'] + ' ' + df['B'] |
| Join strings (with NaN) | .fillna('') + ... | Handle missing before joining |
| Reshape wide to long | .melt() | Column headers become row values |
| Custom aggregation | .mean(), .max(), .min() | df[cols].mean(axis=1) |
| Complex combination logic | .apply(func, axis=1) | Formatted strings, conditionals |
- Use
df[cols].sum(axis=1)for straightforward numeric totals. - Use string concatenation with
+for merging text columns, adding.fillna('')when missing values are possible. - Use
melt()to reshape wide data into long format for visualization.
For complex multi-column logic that does not fit neatly into vectorized operations, use .apply() with axis=1.