Python Pandas: How to Get Unique Values from a Column in Pandas DataFrame
When analyzing data with Pandas, a common task is identifying the distinct (unique) values in a column. Whether you're exploring categorical data, detecting duplicates, validating data quality, or preparing features for machine learning, knowing how to extract unique values efficiently is essential.
For example, if a column contains ['B1', 'B2', 'B3', 'B4', 'B4'], the unique values are ['B1', 'B2', 'B3', 'B4'].
This guide covers multiple approaches, from the most common Pandas methods to Python built-ins, so you can choose the right one for your situation.
Sample DataFrame
All examples in this guide use the following DataFrame:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Department City Score
0 Alice Engineering NYC 85
1 Bob Marketing LA 92
2 Charlie Engineering NYC 85
3 Diana Marketing NYC 78
4 Eve Engineering Chicago 92
Using .unique() (Recommended)
The .unique() method is the most straightforward way to get unique values from a column. It returns a NumPy array of distinct values in the order they first appear.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
unique_depts = df['Department'].unique()
print(unique_depts)
print(type(unique_depts))
Output:
['Engineering' 'Marketing']
<class 'numpy.ndarray'>
Getting Unique Values as a List
If you need a Python list instead of a NumPy array:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
unique_depts = df['Department'].unique().tolist()
print(unique_depts)
Output:
['Engineering', 'Marketing']
Unique Values from Multiple Columns
To get unique values across multiple columns, use pd.unique() with the values from those columns:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
# Unique values across both 'Department' and 'City'
unique_combined = pd.unique(df[['Department', 'City']].values.ravel())
print(unique_combined)
Output:
['Engineering' 'NYC' 'Marketing' 'LA' 'Chicago']
.unique() preserves the order of first occurrence, unlike Python's set() which is unordered. This is useful when the order of appearance matters in your analysis.
Using .nunique() for Counting Unique Values
When you only need to know how many unique values exist (not the values themselves), .nunique() is the most efficient option:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
# Count unique values in a single column
print("Unique departments:", df['Department'].nunique())
print("Unique cities:", df['City'].nunique())
print("Unique scores:", df['Score'].nunique())
Output:
Unique departments: 2
Unique cities: 3
Unique scores: 3
Counting Unique Values Across All Columns
Apply .nunique() to the entire DataFrame to get counts for every column at once:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
print(df.nunique())
Output:
Name 5
Department 2
City 3
Score 3
dtype: int64
This is especially useful for quick data profiling to understand the cardinality of each column.
Using .value_counts() for Unique Values with Frequencies
When you need both the unique values and how often each appears, .value_counts() is the go-to method:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
dept_counts = df['Department'].value_counts()
print(dept_counts)
Output:
Department
Engineering 3
Marketing 2
Name: count, dtype: int64
Extracting Just the Unique Values from Counts
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
unique_values = df['Department'].value_counts().index.tolist()
print("Unique values:", unique_values)
Output:
Unique values: ['Engineering', 'Marketing']
Unlike .unique(), .value_counts() returns values sorted by frequency (most common first) rather than order of first appearance. This is useful for understanding data distribution at a glance.
Normalized Counts (Percentages)
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
dept_pct = df['Department'].value_counts(normalize=True)
print(dept_pct)
Output:
Department
Engineering 0.6
Marketing 0.4
Name: proportion, dtype: float64
Using .drop_duplicates()
The .drop_duplicates() method returns a Series (or DataFrame) with duplicate values removed, preserving the original index:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
unique_cities = df['City'].drop_duplicates()
print(unique_cities)
Output:
0 NYC
1 LA
4 Chicago
Name: City, dtype: object
This is particularly useful when you want to keep the index to trace back to the original rows, or when working with DataFrames (not just Series):
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
# Get unique combinations of Department and City
unique_combos = df[['Department', 'City']].drop_duplicates()
print(unique_combos)
Output:
Department City
0 Engineering NYC
1 Marketing LA
3 Marketing NYC
4 Engineering Chicago
Use drop_duplicates() when you need unique combinations across multiple columns - something .unique() can't do directly.
Using Python's set()
Python's built-in set() converts column values into a set, automatically removing duplicates:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)
unique_scores = set(df['Score'])
print(unique_scores)
Output:
{92, 85, 78}
set() does not preserve the order of first appearance. If order matters, use .unique() instead:
# Order NOT preserved
print(set(df['City'])) # Could be: {'LA', 'NYC', 'Chicago'}
# Order preserved
print(df['City'].unique()) # Always: ['NYC' 'LA' 'Chicago']
Practical Example: Data Quality Check
Here's a real-world example that uses unique value analysis for data profiling:
import pandas as pd
sales = pd.DataFrame({
'product': ['Widget', 'Gadget', 'Widget', 'Doohickey', 'Gadget', 'Widget'],
'region': ['North', 'South', 'North', 'East', 'South', 'West'],
'status': ['completed', 'pending', 'completed', 'completed', 'failed', 'completed']
})
print("=== Data Quality Report ===\n")
for col in sales.columns:
unique_vals = sales[col].unique()
print(f"Column '{col}':")
print(f" Unique count: {sales[col].nunique()}")
print(f" Values: {unique_vals.tolist()}")
print(f" Most common: {sales[col].value_counts().index[0]}")
print()
Output:
=== Data Quality Report ===
Column 'product':
Unique count: 3
Values: ['Widget', 'Gadget', 'Doohickey']
Most common: Widget
Column 'region':
Unique count: 4
Values: ['North', 'South', 'East', 'West']
Most common: North
Column 'status':
Unique count: 3
Values: ['completed', 'pending', 'failed']
Most common: completed
Comparison of Methods
| Method | Returns | Preserves Order | Includes Counts | Best For |
|---|---|---|---|---|
.unique() | NumPy array | ✅ | ❌ | Getting distinct values (recommended) |
.nunique() | Integer | N/A | ❌ | Counting distinct values |
.value_counts() | Series | ❌ (sorted by freq) | ✅ | Values with frequency analysis |
.drop_duplicates() | Series/DataFrame | ✅ | ❌ | Keeping index or multi-column uniqueness |
set() | Python set | ❌ | ❌ | Quick deduplication |
Conclusion
For most use cases, .unique() is the best choice: it's fast, preserves order, and returns a clean array of distinct values.
Use .nunique() when you only need the count, .value_counts() when you also want frequencies, and .drop_duplicates() when you need unique combinations across multiple columns or want to preserve the DataFrame index. Python's set() works in a pinch but loses ordering.
Choose the method that best fits whether you need the values, the count, or the frequency distribution.