Skip to main content

Python Pandas: How to Find Unique Values From Multiple Columns in Pandas

When analyzing data in Pandas, you often need to extract unique values - not just from a single column, but across multiple columns simultaneously. This is useful for tasks like identifying all distinct categories, finding all unique names in a dataset, or building dropdown options from several fields.

In this guide, you will learn multiple methods to extract unique values from multiple DataFrame columns, understand the trade-offs of each approach, and see when to use which method.

Setting Up the Example

import pandas as pd
import numpy as np

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})

print(df)

Output:

  FirstName  LastName  Age
0 Michael Smith 26
1 Emily Johnson 25
2 Jessica Williams 25
3 Daniel Brown 27
4 James Davis 28
5 Daniel Miller 30

Notice that 'Prateek' appears twice in FirstName and 'Yadav' appears twice in LastName. We want to find unique values across these columns.

Unique Values From a Single Column

Before working with multiple columns, here is the basic approach for a single column using the unique() method:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})

unique_first_names = df['FirstName'].unique()
print(unique_first_names)

Output:

['Michael' 'Emily' 'Jessica' 'Daniel' 'James']

'Daniel' appears only once in the result despite being in the column twice.

Method 1: Using pd.concat() and unique()

Concatenate the desired columns into a single Series, then call unique() on the combined result. This works with any combination of data types:

import pandas as pd

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})


unique_values = pd.concat([df['FirstName'], df['LastName']]).unique()
print(unique_values)

Output:

['Michael' 'Emily' 'Jessica' 'Daniel' 'James' 'Smith' 'Johnson' 'Williams'
'Brown' 'Davis' 'Miller']

Including Numeric Columns

You can include columns of any type - strings, integers, floats - in the concatenation:

import pandas as pd

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})

unique_all = pd.concat([df['FirstName'], df['LastName'], df['Age']]).unique()
print(unique_all)

Output:

['Michael' 'Emily' 'Jessica' 'Daniel' 'James' 'Smith' 'Johnson' 'Williams'
'Brown' 'Davis' 'Miller' 26 25 27 28 30]
tip

This is the most versatile method - it handles mixed data types (strings and numbers together) without issues.

Method 2: Using numpy.unique()

NumPy's unique() function works on the underlying array values. Select the desired columns and call np.unique() on the .values array:

import numpy as np
import pandas as pd

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})


unique_names = np.unique(df[['FirstName', 'LastName']].values)
print(unique_names)

Output:

['Brown' 'Daniel' 'Davis' 'Emily' 'James' 'Jessica' 'Johnson' 'Michael'
'Miller' 'Smith' 'Williams']

Notice that the results are sorted alphabetically: np.unique() always returns sorted values, unlike Pandas' unique() which preserves the order of first appearance.

Limitation: Cannot mix string and numeric columns

np.unique() cannot handle columns with different data types in the same operation:

# ❌ This will raise an error or produce unexpected results
np.unique(df[['LastName', 'Age']].values)
# TypeError: '<' not supported between instances of 'int' and 'str'

Use Method 1 (pd.concat().unique()) when you need to combine columns of different types.

Method 3: Using Python Sets

Convert each column to a set and use the union operator (|) to combine them. Sets inherently contain only unique values:

import pandas as pd

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})

unique_values = set(df['FirstName']) | set(df['LastName']) | set(df['Age'])
print(unique_values)

Output:

{'Johnson', 'Miller', 'James', 'Smith', 'Emily', 'Williams', 'Davis', 'Michael', 30, 'Brown', 'Daniel', 'Jessica', 26, 27, 28, 25}
info

Sets are unordered - the output order is not predictable. If order matters, use Method 1 instead.

This method works with any combination of data types and is particularly concise for combining many columns:

# Union of all columns at once
unique_values = set().union(*[set(df[col]) for col in df.columns])
print(sorted(unique_values, key=str))

Method 4: Using DataFrame.melt() and unique()

The melt() function unpivots the DataFrame, stacking selected columns into a single column. This is clean and expressive:

import pandas as pd

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})

unique_values = df[['FirstName', 'LastName']].melt()['value'].unique()
print(unique_values)

Output:

['Michael' 'Emily' 'Jessica' 'Daniel' 'James' 'Smith' 'Johnson' 'Williams'
'Brown' 'Davis' 'Miller']

melt() transforms the two columns into a single value column, and unique() extracts the distinct values.

Method 5: Using DataFrame.stack() and unique()

Similar to melt(), stack() pivots columns into rows:

import pandas as pd

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})

unique_values = df[['FirstName', 'LastName']].stack().unique()
print(unique_values)

Output:

['Michael' 'Smith' 'Emily' 'Johnson' 'Jessica' 'Williams' 'Daniel' 'Brown'
'James' 'Davis' 'Miller']

Finding Unique Rows (Combinations)

If you need unique combinations across columns (unique rows) rather than unique individual values, use drop_duplicates():

import pandas as pd

df_names = pd.DataFrame({
'First': ['Alice', 'Bob', 'Alice', 'Bob'],
'Last': ['Smith', 'Jones', 'Smith', 'Brown']
})

unique_combos = df_names.drop_duplicates()
print(unique_combos)

Output:

   First   Last
0 Alice Smith
1 Bob Jones
3 Bob Brown

Row index 2 (Alice Smith) is removed because it is a duplicate of row 0.

Counting Unique Values Per Column

To see how many unique values each column has, use nunique():

import pandas as pd

df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})

print(df.nunique())

Output:

FirstName    5
LastName 6
Age 5
dtype: int64

Comparison of Methods

MethodMixed TypesSorted OutputPreserves OrderReturns
pd.concat().unique()✅ Yes❌ No✅ First appearanceNumPy array
np.unique()❌ No✅ Yes❌ NoNumPy array
Python set union✅ Yes❌ No❌ No (unordered)Python set
melt().unique()✅ Yes❌ No✅ First appearanceNumPy array
stack().unique()✅ Yes❌ No✅ InterleavedNumPy array

Complete Example

import pandas as pd
import numpy as np

df = pd.DataFrame({
'City': ['NYC', 'LA', 'NYC', 'Chicago', 'LA'],
'State': ['NY', 'CA', 'NY', 'IL', 'CA'],
'Country': ['USA', 'USA', 'USA', 'USA', 'USA']
})

# All unique location values across City and State
unique_locations = pd.concat([df['City'], df['State']]).unique()
print("Unique locations:", unique_locations)
print(f"Count: {len(unique_locations)}")

Output:

Unique locations: ['NYC' 'LA' 'Chicago' 'NY' 'CA' 'IL']
Count: 6

Conclusion

Extracting unique values from multiple Pandas columns can be accomplished with several approaches.

  • pd.concat().unique() is the most versatile: it works with any combination of data types and preserves the order of first appearance.
  • np.unique() is fast and returns sorted results but cannot mix string and numeric columns.
  • Python sets are concise and handle mixed types but produce unordered output.
  • melt() and stack() offer clean Pandas-native alternatives.

Choose the method that best fits your data types, ordering requirements, and coding style.