Skip to main content

Python Pandas: How to Get All Combinations of Two Columns in a Pandas DataFrame

When working with data analysis, you sometimes need to generate all possible combinations (the Cartesian product) between the values of two columns. This is useful for scenarios like pairing participants in experiments, creating feature combinations for machine learning, generating test cases, or building comparison matrices.

In this guide, you will learn how to compute all combinations of two DataFrame columns using Python's itertools.product(), as well as alternative methods using Pandas' built-in merge() with a cross join.

Setting Up the Example

import pandas as pd

df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})

print(df)

Output:

     gents  ladies
0 Michael Emily
1 Daniel Olivia

Our goal is to generate every possible pairing between the gents column and the ladies column - that is, all 2 × 2 = 4 combinations.

Method 1: Using itertools.product()

The itertools.product() function computes the Cartesian product of input iterables - every element from the first iterable is paired with every element from the second:

import pandas as pd
from itertools import product

df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})

# Generate all combinations
combinations = list(product(df['gents'], df['ladies']))

print("All combinations:")
for combo in combinations:
print(combo)

Output:

All combinations:
('Michael', 'Emily')
('Michael', 'Olivia')
('Daniel', 'Emily')
('Daniel', 'Olivia')

Each value from gents is paired with every value from ladies.

Converting the Result to a DataFrame

To work with the combinations as tabular data, convert the list of tuples to a DataFrame:

import pandas as pd
from itertools import product

df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})

combinations = list(product(df['gents'], df['ladies']))
result = pd.DataFrame(combinations, columns=['gents', 'ladies'])

print(result)

Output:

     gents  ladies
0 Michael Emily
1 Michael Olivia
2 Daniel Emily
3 Daniel Olivia

Method 2: Using Pandas merge() With a Cross Join

Pandas provides a built-in cross join (available since Pandas 1.2.0) that computes the Cartesian product directly:

import pandas as pd

df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})

# Create separate DataFrames for each column
gents_df = df[['gents']]
ladies_df = df[['ladies']]

# Cross join produces all combinations
result = gents_df.merge(ladies_df, how='cross')

print(result)

Output:

     gents  ladies
0 Michael Emily
1 Michael Olivia
2 Daniel Emily
3 Daniel Olivia
tip

The how='cross' parameter was introduced in Pandas 1.2.0. If you are using an older version, you can simulate it by adding a temporary key column:

# For Pandas < 1.2.0
gents_df = df[['gents']].assign(key=1)
ladies_df = df[['ladies']].assign(key=1)
result = gents_df.merge(ladies_df, on='key').drop('key', axis=1)

Method 3: Using pd.MultiIndex.from_product()

For generating combinations as a MultiIndex (useful for creating matrices or pivot-like structures):

import pandas as pd

df = pd.DataFrame({
'gents': ['Michael', 'Daniel'],
'ladies': ['Emily', 'Olivia']
})

# Create a MultiIndex from the Cartesian product
index = pd.MultiIndex.from_product(
[df['gents'], df['ladies']],
names=['gents', 'ladies']
)

result = pd.DataFrame(index=index).reset_index()
print(result)

Output:

     gents  ladies
0 Michael Emily
1 Michael Olivia
2 Daniel Emily
3 Daniel Olivia

Practical Example: Student-Course Enrollment

Generate all possible student-course combinations to identify which enrollments are missing:

import pandas as pd
from itertools import product

students = pd.DataFrame({
'student_id': [101, 102, 103],
'student_name': ['Alice', 'Bob', 'Charlie']
})

courses = pd.DataFrame({
'course_id': ['CS101', 'MATH201'],
'course_name': ['Intro to CS', 'Calculus']
})

# All possible enrollments
all_combos = list(product(students['student_name'], courses['course_name']))
enrollment_matrix = pd.DataFrame(all_combos, columns=['Student', 'Course'])

print(enrollment_matrix)

Output:

   Student       Course
0 Alice Intro to CS
1 Alice Calculus
2 Bob Intro to CS
3 Bob Calculus
4 Charlie Intro to CS
5 Charlie Calculus

Combinations of a Column With Itself

To generate all pairs from a single column (e.g., for round-robin matchups), use itertools.combinations() to avoid pairing an element with itself:

from itertools import combinations
import pandas as pd

df = pd.DataFrame({'players': ['Alice', 'Bob', 'Charlie', 'Diana']})

# All unique pairs (no self-pairing, no duplicates)
matchups = list(combinations(df['players'], 2))
result = pd.DataFrame(matchups, columns=['Player 1', 'Player 2'])

print(result)

Output:

  Player 1 Player 2
0 Alice Bob
1 Alice Charlie
2 Alice Diana
3 Bob Charlie
4 Bob Diana
5 Charlie Diana
product() vs. combinations() vs. permutations()
FunctionSelf-pairingOrder mattersExample for [A, B]
product(col, col)✅ Yes✅ Yes(A,A), (A,B), (B,A), (B,B)
combinations(col, 2)❌ No❌ No(A,B)
permutations(col, 2)❌ No✅ Yes(A,B), (B,A)

Choose the function that matches your specific pairing requirements.

Performance Considerations

The Cartesian product grows multiplicatively. If column A has m values and column B has n values, the result has m × n rows:

Column A SizeColumn B SizeResult Rows
1010100
10010010,000
1,0001,0001,000,000
10,00010,000100,000,000
Watch out for memory with large columns

For large columns, the cross product can consume significant memory. Consider:

  • Filtering before generating combinations (reduce input size).
  • Processing in chunks using itertools.product() as a lazy iterator instead of converting to a list.
  • Using database joins if the data is in a database.
from itertools import product

# ✅ Lazy: processes one combo at a time, no memory spike
for gent, lady in product(df['gents'], df['ladies']):
process(gent, lady)

# ❌ Eager: loads ALL combos into memory at once
all_combos = list(product(df['gents'], df['ladies']))

Comparison of Methods

MethodReturnsBest For
itertools.product()List of tuplesGeneral-purpose, works with any iterables
merge(how='cross')DataFrameStaying within Pandas, clean DataFrame output
MultiIndex.from_product()MultiIndex / DataFrameCreating structured indices or pivot tables

Conclusion

Generating all combinations of two DataFrame columns is straightforward in Python.

  • itertools.product() is the most versatile approach and works with any iterable, while pd.merge(how='cross') keeps everything within the Pandas ecosystem and produces a DataFrame directly.
  • For structured indexing scenarios, pd.MultiIndex.from_product() provides an elegant solution.

Whichever method you choose, be mindful of the output size: the Cartesian product grows multiplicatively, so always consider filtering your data before generating combinations when working with large datasets.