Skip to main content

Python Pandas: How to Get Unique Values from a Column in Pandas DataFrame

When analyzing data with Pandas, a common task is identifying the distinct (unique) values in a column. Whether you're exploring categorical data, detecting duplicates, validating data quality, or preparing features for machine learning, knowing how to extract unique values efficiently is essential.

For example, if a column contains ['B1', 'B2', 'B3', 'B4', 'B4'], the unique values are ['B1', 'B2', 'B3', 'B4'].

This guide covers multiple approaches, from the most common Pandas methods to Python built-ins, so you can choose the right one for your situation.

Sample DataFrame

All examples in this guide use the following DataFrame:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name   Department     City  Score
0 Alice Engineering NYC 85
1 Bob Marketing LA 92
2 Charlie Engineering NYC 85
3 Diana Marketing NYC 78
4 Eve Engineering Chicago 92

The .unique() method is the most straightforward way to get unique values from a column. It returns a NumPy array of distinct values in the order they first appear.

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_depts = df['Department'].unique()
print(unique_depts)
print(type(unique_depts))

Output:

['Engineering' 'Marketing']
<class 'numpy.ndarray'>

Getting Unique Values as a List

If you need a Python list instead of a NumPy array:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_depts = df['Department'].unique().tolist()
print(unique_depts)

Output:

['Engineering', 'Marketing']

Unique Values from Multiple Columns

To get unique values across multiple columns, use pd.unique() with the values from those columns:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

# Unique values across both 'Department' and 'City'
unique_combined = pd.unique(df[['Department', 'City']].values.ravel())
print(unique_combined)

Output:

['Engineering' 'NYC' 'Marketing' 'LA' 'Chicago']
tip

.unique() preserves the order of first occurrence, unlike Python's set() which is unordered. This is useful when the order of appearance matters in your analysis.

Using .nunique() for Counting Unique Values

When you only need to know how many unique values exist (not the values themselves), .nunique() is the most efficient option:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

# Count unique values in a single column
print("Unique departments:", df['Department'].nunique())
print("Unique cities:", df['City'].nunique())
print("Unique scores:", df['Score'].nunique())

Output:

Unique departments: 2
Unique cities: 3
Unique scores: 3

Counting Unique Values Across All Columns

Apply .nunique() to the entire DataFrame to get counts for every column at once:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

print(df.nunique())

Output:

Name          5
Department 2
City 3
Score 3
dtype: int64

This is especially useful for quick data profiling to understand the cardinality of each column.

Using .value_counts() for Unique Values with Frequencies

When you need both the unique values and how often each appears, .value_counts() is the go-to method:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

dept_counts = df['Department'].value_counts()
print(dept_counts)

Output:

Department
Engineering 3
Marketing 2
Name: count, dtype: int64

Extracting Just the Unique Values from Counts

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_values = df['Department'].value_counts().index.tolist()
print("Unique values:", unique_values)

Output:

Unique values: ['Engineering', 'Marketing']
note

Unlike .unique(), .value_counts() returns values sorted by frequency (most common first) rather than order of first appearance. This is useful for understanding data distribution at a glance.

Normalized Counts (Percentages)

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

dept_pct = df['Department'].value_counts(normalize=True)
print(dept_pct)

Output:

Department
Engineering 0.6
Marketing 0.4
Name: proportion, dtype: float64

Using .drop_duplicates()

The .drop_duplicates() method returns a Series (or DataFrame) with duplicate values removed, preserving the original index:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_cities = df['City'].drop_duplicates()
print(unique_cities)

Output:

0        NYC
1 LA
4 Chicago
Name: City, dtype: object

This is particularly useful when you want to keep the index to trace back to the original rows, or when working with DataFrames (not just Series):

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

# Get unique combinations of Department and City
unique_combos = df[['Department', 'City']].drop_duplicates()
print(unique_combos)

Output:

    Department     City
0 Engineering NYC
1 Marketing LA
3 Marketing NYC
4 Engineering Chicago
tip

Use drop_duplicates() when you need unique combinations across multiple columns - something .unique() can't do directly.

Using Python's set()

Python's built-in set() converts column values into a set, automatically removing duplicates:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_scores = set(df['Score'])
print(unique_scores)

Output:

{92, 85, 78}
Sets are unordered

set() does not preserve the order of first appearance. If order matters, use .unique() instead:

# Order NOT preserved
print(set(df['City'])) # Could be: {'LA', 'NYC', 'Chicago'}

# Order preserved
print(df['City'].unique()) # Always: ['NYC' 'LA' 'Chicago']

Practical Example: Data Quality Check

Here's a real-world example that uses unique value analysis for data profiling:

import pandas as pd

sales = pd.DataFrame({
'product': ['Widget', 'Gadget', 'Widget', 'Doohickey', 'Gadget', 'Widget'],
'region': ['North', 'South', 'North', 'East', 'South', 'West'],
'status': ['completed', 'pending', 'completed', 'completed', 'failed', 'completed']
})

print("=== Data Quality Report ===\n")
for col in sales.columns:
unique_vals = sales[col].unique()
print(f"Column '{col}':")
print(f" Unique count: {sales[col].nunique()}")
print(f" Values: {unique_vals.tolist()}")
print(f" Most common: {sales[col].value_counts().index[0]}")
print()

Output:

=== Data Quality Report ===

Column 'product':
Unique count: 3
Values: ['Widget', 'Gadget', 'Doohickey']
Most common: Widget

Column 'region':
Unique count: 4
Values: ['North', 'South', 'East', 'West']
Most common: North

Column 'status':
Unique count: 3
Values: ['completed', 'pending', 'failed']
Most common: completed

Comparison of Methods

MethodReturnsPreserves OrderIncludes CountsBest For
.unique()NumPy arrayGetting distinct values (recommended)
.nunique()IntegerN/ACounting distinct values
.value_counts()Series❌ (sorted by freq)Values with frequency analysis
.drop_duplicates()Series/DataFrameKeeping index or multi-column uniqueness
set()Python setQuick deduplication

Conclusion

For most use cases, .unique() is the best choice: it's fast, preserves order, and returns a clean array of distinct values.

Use .nunique() when you only need the count, .value_counts() when you also want frequencies, and .drop_duplicates() when you need unique combinations across multiple columns or want to preserve the DataFrame index. Python's set() works in a pinch but loses ordering.

Choose the method that best fits whether you need the values, the count, or the frequency distribution.