Python Pandas: Create New Column of Tuples (or Lists) from Two Columns
In Pandas, a common data transformation task is to combine values from two (or more) existing columns for each row into a single tuple or list, and then store this collection as a new column in the DataFrame. This can be useful for creating composite keys, feature engineering, or preparing data for functions that expect grouped inputs.
This guide explains several effective methods to create a new DataFrame column containing tuples or lists derived from two existing columns, using techniques like zip(), apply(), itertuples(), and values.tolist().
The Goal: Combining Two Columns into Tuples/Lists Row-wise
Given a Pandas DataFrame, we want to take two specific columns, say 'ColumnA' and 'ColumnB'. For each row, we want to create a tuple (value_from_A, value_from_B) or a list [value_from_A, value_from_B] and store this new collection in a new column, say 'Combined_AB'.
Example DataFrame
We'll use the following DataFrame for our examples:
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)
Output:
Original DataFrame:
EmployeeID FirstName LastName Department Salary
0 101 Alice Smith HR 60000
1 102 Bob Johnson Engineering 85000
2 103 Charlie Brown HR 62000
3 104 David Lee Sales 70000
Creating a New Column of Tuples from Two Columns
Let's say we want to create a new column FullNameTuple from FirstName and LastName, and another ContactList from FirstName and Department.
Using zip() and list() (Recommended for Tuples)
The built-in zip() function is excellent for pairing up elements from multiple iterables. When applied to two DataFrame columns (which are Pandas Series), it yields tuples of corresponding elements.
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
# Combine 'FirstName' and 'LastName' into tuples
# df['FirstName'] and df['LastName'] are Series
zipped_values = zip(df['FirstName'], df['LastName'])
# ✅ Assign the list of tuples to a new column
df['FullNameTuple'] = list(zipped_values)
print("DataFrame with 'FullNameTuple' column (using zip):")
print(df[['FirstName', 'LastName', 'FullNameTuple']])
Output:
DataFrame with 'FullNameTuple' column (using zip):
FirstName LastName FullNameTuple
0 Alice Smith (Alice, Smith)
1 Bob Johnson (Bob, Johnson)
2 Charlie Brown (Charlie, Brown)
3 David Lee (David, Lee)
zip(df['col1'], df['col2'])creates an iterator of tuples.list(...)converts this iterator into a list of tuples, which can then be assigned as a new Series/column.- This method is very Pythonic, readable, and generally efficient.