Python Pandas: How to Find Unique Values From Multiple Columns in Pandas
When analyzing data in Pandas, you often need to extract unique values - not just from a single column, but across multiple columns simultaneously. This is useful for tasks like identifying all distinct categories, finding all unique names in a dataset, or building dropdown options from several fields.
In this guide, you will learn multiple methods to extract unique values from multiple DataFrame columns, understand the trade-offs of each approach, and see when to use which method.
Setting Up the Example
import pandas as pd
import numpy as np
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
print(df)
Output:
FirstName LastName Age
0 Michael Smith 26
1 Emily Johnson 25
2 Jessica Williams 25
3 Daniel Brown 27
4 James Davis 28
5 Daniel Miller 30
Notice that 'Prateek' appears twice in FirstName and 'Yadav' appears twice in LastName. We want to find unique values across these columns.
Unique Values From a Single Column
Before working with multiple columns, here is the basic approach for a single column using the unique() method:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
unique_first_names = df['FirstName'].unique()
print(unique_first_names)
Output:
['Michael' 'Emily' 'Jessica' 'Daniel' 'James']
'Daniel' appears only once in the result despite being in the column twice.
Method 1: Using pd.concat() and unique()
Concatenate the desired columns into a single Series, then call unique() on the combined result. This works with any combination of data types:
import pandas as pd
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
unique_values = pd.concat([df['FirstName'], df['LastName']]).unique()
print(unique_values)
Output:
['Michael' 'Emily' 'Jessica' 'Daniel' 'James' 'Smith' 'Johnson' 'Williams'
'Brown' 'Davis' 'Miller']
Including Numeric Columns
You can include columns of any type - strings, integers, floats - in the concatenation:
import pandas as pd
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
unique_all = pd.concat([df['FirstName'], df['LastName'], df['Age']]).unique()
print(unique_all)
Output:
['Michael' 'Emily' 'Jessica' 'Daniel' 'James' 'Smith' 'Johnson' 'Williams'
'Brown' 'Davis' 'Miller' 26 25 27 28 30]
This is the most versatile method - it handles mixed data types (strings and numbers together) without issues.
Method 2: Using numpy.unique()
NumPy's unique() function works on the underlying array values. Select the desired columns and call np.unique() on the .values array:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
unique_names = np.unique(df[['FirstName', 'LastName']].values)
print(unique_names)
Output:
['Brown' 'Daniel' 'Davis' 'Emily' 'James' 'Jessica' 'Johnson' 'Michael'
'Miller' 'Smith' 'Williams']
Notice that the results are sorted alphabetically: np.unique() always returns sorted values, unlike Pandas' unique() which preserves the order of first appearance.
np.unique() cannot handle columns with different data types in the same operation:
# ❌ This will raise an error or produce unexpected results
np.unique(df[['LastName', 'Age']].values)
# TypeError: '<' not supported between instances of 'int' and 'str'
Use Method 1 (pd.concat().unique()) when you need to combine columns of different types.
Method 3: Using Python Sets
Convert each column to a set and use the union operator (|) to combine them. Sets inherently contain only unique values:
import pandas as pd
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
unique_values = set(df['FirstName']) | set(df['LastName']) | set(df['Age'])
print(unique_values)
Output:
{'Johnson', 'Miller', 'James', 'Smith', 'Emily', 'Williams', 'Davis', 'Michael', 30, 'Brown', 'Daniel', 'Jessica', 26, 27, 28, 25}
Sets are unordered - the output order is not predictable. If order matters, use Method 1 instead.
This method works with any combination of data types and is particularly concise for combining many columns:
# Union of all columns at once
unique_values = set().union(*[set(df[col]) for col in df.columns])
print(sorted(unique_values, key=str))
Method 4: Using DataFrame.melt() and unique()
The melt() function unpivots the DataFrame, stacking selected columns into a single column. This is clean and expressive:
import pandas as pd
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
unique_values = df[['FirstName', 'LastName']].melt()['value'].unique()
print(unique_values)
Output:
['Michael' 'Emily' 'Jessica' 'Daniel' 'James' 'Smith' 'Johnson' 'Williams'
'Brown' 'Davis' 'Miller']
melt() transforms the two columns into a single value column, and unique() extracts the distinct values.
Method 5: Using DataFrame.stack() and unique()
Similar to melt(), stack() pivots columns into rows:
import pandas as pd
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
unique_values = df[['FirstName', 'LastName']].stack().unique()
print(unique_values)
Output:
['Michael' 'Smith' 'Emily' 'Johnson' 'Jessica' 'Williams' 'Daniel' 'Brown'
'James' 'Davis' 'Miller']
Finding Unique Rows (Combinations)
If you need unique combinations across columns (unique rows) rather than unique individual values, use drop_duplicates():
import pandas as pd
df_names = pd.DataFrame({
'First': ['Alice', 'Bob', 'Alice', 'Bob'],
'Last': ['Smith', 'Jones', 'Smith', 'Brown']
})
unique_combos = df_names.drop_duplicates()
print(unique_combos)
Output:
First Last
0 Alice Smith
1 Bob Jones
3 Bob Brown
Row index 2 (Alice Smith) is removed because it is a duplicate of row 0.
Counting Unique Values Per Column
To see how many unique values each column has, use nunique():
import pandas as pd
df = pd.DataFrame({
'FirstName': ['Michael', 'Emily', 'Jessica', 'Daniel', 'James', 'Daniel'],
'LastName': ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Miller'],
'Age': [26, 25, 25, 27, 28, 30]
})
print(df.nunique())
Output:
FirstName 5
LastName 6
Age 5
dtype: int64
Comparison of Methods
| Method | Mixed Types | Sorted Output | Preserves Order | Returns |
|---|---|---|---|---|
pd.concat().unique() | ✅ Yes | ❌ No | ✅ First appearance | NumPy array |
np.unique() | ❌ No | ✅ Yes | ❌ No | NumPy array |
Python set union | ✅ Yes | ❌ No | ❌ No (unordered) | Python set |
melt().unique() | ✅ Yes | ❌ No | ✅ First appearance | NumPy array |
stack().unique() | ✅ Yes | ❌ No | ✅ Interleaved | NumPy array |
Complete Example
import pandas as pd
import numpy as np
df = pd.DataFrame({
'City': ['NYC', 'LA', 'NYC', 'Chicago', 'LA'],
'State': ['NY', 'CA', 'NY', 'IL', 'CA'],
'Country': ['USA', 'USA', 'USA', 'USA', 'USA']
})
# All unique location values across City and State
unique_locations = pd.concat([df['City'], df['State']]).unique()
print("Unique locations:", unique_locations)
print(f"Count: {len(unique_locations)}")
Output:
Unique locations: ['NYC' 'LA' 'Chicago' 'NY' 'CA' 'IL']
Count: 6
Conclusion
Extracting unique values from multiple Pandas columns can be accomplished with several approaches.
pd.concat().unique()is the most versatile: it works with any combination of data types and preserves the order of first appearance.np.unique()is fast and returns sorted results but cannot mix string and numeric columns.- Python sets are concise and handle mixed types but produce unordered output.
melt()andstack()offer clean Pandas-native alternatives.
Choose the method that best fits your data types, ordering requirements, and coding style.