Python Pandas: How to Get Unique Values from a Column in Pandas DataFrame

When analyzing data with Pandas, a common task is identifying the distinct (unique) values in a column. Whether you're exploring categorical data, detecting duplicates, validating data quality, or preparing features for machine learning, knowing how to extract unique values efficiently is essential.

For example, if a column contains ['B1', 'B2', 'B3', 'B4', 'B4'], the unique values are ['B1', 'B2', 'B3', 'B4'].

This guide covers multiple approaches, from the most common Pandas methods to Python built-ins, so you can choose the right one for your situation.

Sample DataFrame

All examples in this guide use the following DataFrame:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}

df = pd.DataFrame(data)
print(df)

Output:

      Name   Department     City  Score
  Alice  Engineering      NYC     85
    Bob    Marketing       LA     92
Charlie  Engineering      NYC     85
  Diana    Marketing      NYC     78
    Eve  Engineering  Chicago     92

Using `.unique()` (Recommended)

The .unique() method is the most straightforward way to get unique values from a column. It returns a NumPy array of distinct values in the order they first appear.

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_depts = df['Department'].unique()
print(unique_depts)
print(type(unique_depts))

Output:

['Engineering' 'Marketing']
<class 'numpy.ndarray'>

Getting Unique Values as a List

If you need a Python list instead of a NumPy array:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_depts = df['Department'].unique().tolist()
print(unique_depts)

Output:

['Engineering', 'Marketing']

Unique Values from Multiple Columns

To get unique values across multiple columns, use pd.unique() with the values from those columns:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

# Unique values across both 'Department' and 'City'
unique_combined = pd.unique(df[['Department', 'City']].values.ravel())
print(unique_combined)

Output:

['Engineering' 'NYC' 'Marketing' 'LA' 'Chicago']

tip

.unique() preserves the order of first occurrence, unlike Python's set() which is unordered. This is useful when the order of appearance matters in your analysis.

Using `.nunique()` for Counting Unique Values

When you only need to know how many unique values exist (not the values themselves), .nunique() is the most efficient option:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

# Count unique values in a single column
print("Unique departments:", df['Department'].nunique())
print("Unique cities:", df['City'].nunique())
print("Unique scores:", df['Score'].nunique())

Output:

Unique departments: 2
Unique cities: 3
Unique scores: 3

Counting Unique Values Across All Columns

Apply .nunique() to the entire DataFrame to get counts for every column at once:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

print(df.nunique())

Output:

Name          5
Department    2
City          3
Score         3
dtype: int64

This is especially useful for quick data profiling to understand the cardinality of each column.

Using `.value_counts()` for Unique Values with Frequencies

When you need both the unique values and how often each appears, .value_counts() is the go-to method:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

dept_counts = df['Department'].value_counts()
print(dept_counts)

Output:

Department
Engineering    3
Marketing      2
Name: count, dtype: int64

Extracting Just the Unique Values from Counts

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_values = df['Department'].value_counts().index.tolist()
print("Unique values:", unique_values)

Output:

Unique values: ['Engineering', 'Marketing']

note

Unlike .unique(), .value_counts() returns values sorted by frequency (most common first) rather than order of first appearance. This is useful for understanding data distribution at a glance.

Normalized Counts (Percentages)

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

dept_pct = df['Department'].value_counts(normalize=True)
print(dept_pct)

Output:

Department
Engineering    0.6
Marketing      0.4
Name: proportion, dtype: float64

Using `.drop_duplicates()`

The .drop_duplicates() method returns a Series (or DataFrame) with duplicate values removed, preserving the original index:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_cities = df['City'].drop_duplicates()
print(unique_cities)

Output:

      NYC
       LA
  Chicago
Name: City, dtype: object

This is particularly useful when you want to keep the index to trace back to the original rows, or when working with DataFrames (not just Series):

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

# Get unique combinations of Department and City
unique_combos = df[['Department', 'City']].drop_duplicates()
print(unique_combos)

Output:

    Department     City
Engineering      NYC
  Marketing       LA
  Marketing      NYC
Engineering  Chicago

tip

Use drop_duplicates() when you need unique combinations across multiple columns - something .unique() can't do directly.

Using Python's `set()`

Python's built-in set() converts column values into a set, automatically removing duplicates:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering'],
    'City': ['NYC', 'LA', 'NYC', 'NYC', 'Chicago'],
    'Score': [85, 92, 85, 78, 92]
}
df = pd.DataFrame(data)

unique_scores = set(df['Score'])
print(unique_scores)

Output:

{92, 85, 78}

Sets are unordered

set() does not preserve the order of first appearance. If order matters, use .unique() instead:

# Order NOT preserved
print(set(df['City']))      # Could be: {'LA', 'NYC', 'Chicago'}

# Order preserved
print(df['City'].unique())  # Always: ['NYC' 'LA' 'Chicago']

Practical Example: Data Quality Check

Here's a real-world example that uses unique value analysis for data profiling:

import pandas as pd

sales = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Widget', 'Doohickey', 'Gadget', 'Widget'],
    'region': ['North', 'South', 'North', 'East', 'South', 'West'],
    'status': ['completed', 'pending', 'completed', 'completed', 'failed', 'completed']
})

print("=== Data Quality Report ===\n")
for col in sales.columns:
    unique_vals = sales[col].unique()
    print(f"Column '{col}':")
    print(f"  Unique count: {sales[col].nunique()}")
    print(f"  Values: {unique_vals.tolist()}")
    print(f"  Most common: {sales[col].value_counts().index[0]}")
    print()

Output:

=== Data Quality Report ===

Column 'product':
  Unique count: 3
  Values: ['Widget', 'Gadget', 'Doohickey']
  Most common: Widget

Column 'region':
  Unique count: 4
  Values: ['North', 'South', 'East', 'West']
  Most common: North

Column 'status':
  Unique count: 3
  Values: ['completed', 'pending', 'failed']
  Most common: completed

Comparison of Methods

Method	Returns	Preserves Order	Includes Counts	Best For
`.unique()`	NumPy array	✅	❌	Getting distinct values (recommended)
`.nunique()`	Integer	N/A	❌	Counting distinct values
`.value_counts()`	Series	❌ (sorted by freq)	✅	Values with frequency analysis
`.drop_duplicates()`	Series/DataFrame	✅	❌	Keeping index or multi-column uniqueness
`set()`	Python set	❌	❌	Quick deduplication

Conclusion

For most use cases, .unique() is the best choice: it's fast, preserves order, and returns a clean array of distinct values.

Use .nunique() when you only need the count, .value_counts() when you also want frequencies, and .drop_duplicates() when you need unique combinations across multiple columns or want to preserve the DataFrame index. Python's set() works in a pinch but loses ordering.

Choose the method that best fits whether you need the values, the count, or the frequency distribution.

Sample DataFrame​

Using .unique() (Recommended)​

Getting Unique Values as a List​

Unique Values from Multiple Columns​

Using .nunique() for Counting Unique Values​

Counting Unique Values Across All Columns​

Using .value_counts() for Unique Values with Frequencies​

Extracting Just the Unique Values from Counts​

Normalized Counts (Percentages)​

Using .drop_duplicates()​

Using Python's set()​

Practical Example: Data Quality Check​

Comparison of Methods​

Conclusion​

Table of Contents

Sample DataFrame

Using `.unique()` (Recommended)

Getting Unique Values as a List

Unique Values from Multiple Columns

Using `.nunique()` for Counting Unique Values

Counting Unique Values Across All Columns

Using `.value_counts()` for Unique Values with Frequencies

Extracting Just the Unique Values from Counts

Normalized Counts (Percentages)

Using `.drop_duplicates()`

Using Python's `set()`

Practical Example: Data Quality Check

Comparison of Methods

Conclusion