Python Pandas: How to Get All Combinations of Two Columns in a Pandas DataFrame
When working with data analysis, you sometimes need to generate all possible combinations (the Cartesian product) between the values of two columns. This is useful for scenarios like pairing participants in experiments, creating feature combinations for machine learning, generating test cases, or building comparison matrices.
In this guide, you will learn how to compute all combinations of two DataFrame columns using Python's itertools.product(), as well as alternative methods using Pandas' built-in merge() with a cross join.
Setting Up the Example
import pandas as pd
df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})
print(df)
Output:
gents ladies
0 Michael Emily
1 Daniel Olivia
Our goal is to generate every possible pairing between the gents column and the ladies column - that is, all 2 × 2 = 4 combinations.
Method 1: Using itertools.product()
The itertools.product() function computes the Cartesian product of input iterables - every element from the first iterable is paired with every element from the second:
import pandas as pd
from itertools import product
df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})
# Generate all combinations
combinations = list(product(df['gents'], df['ladies']))
print("All combinations:")
for combo in combinations:
print(combo)
Output:
All combinations:
('Michael', 'Emily')
('Michael', 'Olivia')
('Daniel', 'Emily')
('Daniel', 'Olivia')
Each value from gents is paired with every value from ladies.
Converting the Result to a DataFrame
To work with the combinations as tabular data, convert the list of tuples to a DataFrame:
import pandas as pd
from itertools import product
df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})
combinations = list(product(df['gents'], df['ladies']))
result = pd.DataFrame(combinations, columns=['gents', 'ladies'])
print(result)
Output:
gents ladies
0 Michael Emily
1 Michael Olivia
2 Daniel Emily
3 Daniel Olivia
Method 2: Using Pandas merge() With a Cross Join
Pandas provides a built-in cross join (available since Pandas 1.2.0) that computes the Cartesian product directly:
import pandas as pd
df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})
# Create separate DataFrames for each column
gents_df = df[['gents']]
ladies_df = df[['ladies']]
# Cross join produces all combinations
result = gents_df.merge(ladies_df, how='cross')
print(result)
Output:
gents ladies
0 Michael Emily
1 Michael Olivia
2 Daniel Emily
3 Daniel Olivia
The how='cross' parameter was introduced in Pandas 1.2.0. If you are using an older version, you can simulate it by adding a temporary key column:
# For Pandas < 1.2.0
gents_df = df[['gents']].assign(key=1)
ladies_df = df[['ladies']].assign(key=1)
result = gents_df.merge(ladies_df, on='key').drop('key', axis=1)
Method 3: Using pd.MultiIndex.from_product()
For generating combinations as a MultiIndex (useful for creating matrices or pivot-like structures):
import pandas as pd
df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})
# Create a MultiIndex from the Cartesian product
index = pd.MultiIndex.from_product(
[df['gents'], df['ladies']],
names=['gents', 'ladies']
)
result = pd.DataFrame(index=index).reset_index()
print(result)
Output:
gents ladies
0 Michael Emily
1 Michael Olivia
2 Daniel Emily
3 Daniel Olivia
Practical Example: Student-Course Enrollment
Generate all possible student-course combinations to identify which enrollments are missing:
import pandas as pd
from itertools import product
students = pd.DataFrame({
'student_id': [101, 102, 103],
'student_name': ['Alice', 'Bob', 'Charlie']
})
courses = pd.DataFrame({
'course_id': ['CS101', 'MATH201'],
'course_name': ['Intro to CS', 'Calculus']
})
# All possible enrollments
all_combos = list(product(students['student_name'], courses['course_name']))
enrollment_matrix = pd.DataFrame(all_combos, columns=['Student', 'Course'])
print(enrollment_matrix)
Output:
Student Course
0 Alice Intro to CS
1 Alice Calculus
2 Bob Intro to CS
3 Bob Calculus
4 Charlie Intro to CS
5 Charlie Calculus
Combinations of a Column With Itself
To generate all pairs from a single column (e.g., for round-robin matchups), use itertools.combinations() to avoid pairing an element with itself:
from itertools import combinations
import pandas as pd
df = pd.DataFrame({'players': ['Alice', 'Bob', 'Charlie', 'Diana']})
# All unique pairs (no self-pairing, no duplicates)
matchups = list(combinations(df['players'], 2))
result = pd.DataFrame(matchups, columns=['Player 1', 'Player 2'])
print(result)
Output:
Player 1 Player 2
0 Alice Bob
1 Alice Charlie
2 Alice Diana
3 Bob Charlie
4 Bob Diana
5 Charlie Diana
product() vs. combinations() vs. permutations()| Function | Self-pairing | Order matters | Example for [A, B] |
|---|---|---|---|
product(col, col) | ✅ Yes | ✅ Yes | (A,A), (A,B), (B,A), (B,B) |
combinations(col, 2) | ❌ No | ❌ No | (A,B) |
permutations(col, 2) | ❌ No | ✅ Yes | (A,B), (B,A) |
Choose the function that matches your specific pairing requirements.
Performance Considerations
The Cartesian product grows multiplicatively. If column A has m values and column B has n values, the result has m × n rows:
| Column A Size | Column B Size | Result Rows |
|---|---|---|
| 10 | 10 | 100 |
| 100 | 100 | 10,000 |
| 1,000 | 1,000 | 1,000,000 |
| 10,000 | 10,000 | 100,000,000 |
For large columns, the cross product can consume significant memory. Consider:
- Filtering before generating combinations (reduce input size).
- Processing in chunks using
itertools.product()as a lazy iterator instead of converting to a list. - Using database joins if the data is in a database.
from itertools import product
# ✅ Lazy: processes one combo at a time, no memory spike
for gent, lady in product(df['gents'], df['ladies']):
process(gent, lady)
# ❌ Eager: loads ALL combos into memory at once
all_combos = list(product(df['gents'], df['ladies']))
Comparison of Methods
| Method | Returns | Best For |
|---|---|---|
itertools.product() | List of tuples | General-purpose, works with any iterables |
merge(how='cross') | DataFrame | Staying within Pandas, clean DataFrame output |
MultiIndex.from_product() | MultiIndex / DataFrame | Creating structured indices or pivot tables |
Conclusion
Generating all combinations of two DataFrame columns is straightforward in Python.
itertools.product()is the most versatile approach and works with any iterable, whilepd.merge(how='cross')keeps everything within the Pandas ecosystem and produces a DataFrame directly.- For structured indexing scenarios,
pd.MultiIndex.from_product()provides an elegant solution.
Whichever method you choose, be mindful of the output size: the Cartesian product grows multiplicatively, so always consider filtering your data before generating combinations when working with large datasets.