Skip to main content

SQL DISTINCT for Removing Duplicates Values from Results

As you start querying real-world data, you will quickly encounter a common problem: duplicate values in your results. You ask the database "which cities do my customers live in?" and instead of getting a clean list of unique cities, you get the same city repeated multiple times because several customers share the same location. You ask "what statuses do my orders have?" and you see "pending" listed three times.

The DISTINCT keyword solves this problem. It tells the database to remove duplicate rows from your results and return only unique values. It is a simple but powerful tool that turns noisy, repetitive result sets into clean, meaningful lists.

This guide covers everything you need to know about DISTINCT: how it works on single columns, how it behaves with multiple columns, when it is the right tool to use, and when reaching for DISTINCT is actually a sign that something else in your query needs fixing. Every example uses the ShopSmart sample database with full outputs so you can follow along (we defined it in a previous guide here).

The Duplicate Problem

Before learning the solution, let us clearly see the problem. Suppose you want to know which cities your customers are located in:

SELECT city
FROM customers;

Output:

city
New York
Los Angeles
Chicago
New York
Seattle
Chicago
NULL

New York appears twice. Chicago appears twice. You have 7 rows, but only 5 unique cities (including NULL). If you are building a dropdown filter for your website or generating a report of customer locations, these duplicates are unwanted noise.

The same issue appears with other columns. What order statuses exist in the system?

SELECT status
FROM orders;

Output:

status
completed
completed
completed
shipped
shipped
pending
pending
pending

There are only 3 distinct statuses, but the query returns 8 rows with heavy repetition. This is where DISTINCT comes in.

How SELECT DISTINCT Works

Adding the DISTINCT keyword immediately after SELECT tells the database to eliminate duplicate rows from the results:

SELECT DISTINCT column_name
FROM table_name;

Let us fix the city query:

SELECT DISTINCT city
FROM customers;

Output:

city
New York
Los Angeles
Chicago
Seattle
NULL

Each city now appears exactly once. The duplicates are gone. Notice that NULL is included as its own distinct value.

And the status query:

SELECT DISTINCT status
FROM orders;

Output:

status
completed
shipped
pending

Clean, simple, and exactly what you need. Three statuses, three rows.

How DISTINCT Works Internally

When you use DISTINCT, the database processes all matching rows first, then removes duplicates before returning the results. It compares every column in the SELECT list to determine whether two rows are identical. If all selected column values match between two rows, one of them is eliminated.

More Single-Column Examples

Find all unique categories that have products assigned:

SELECT DISTINCT category_id
FROM products;

Output:

category_id
1
2
3
4

Find all unique order dates (days on which at least one order was placed):

SELECT DISTINCT order_date
FROM orders
ORDER BY order_date;

Output:

order_date
2024-01-10
2024-01-15
2024-02-20
2024-03-05
2024-03-12
2024-03-20
2024-04-01
2024-04-10

In this case, every order happened to be on a different date, so DISTINCT did not remove any rows. That is perfectly fine. DISTINCT does not fail or behave differently when there are no duplicates; it simply returns all rows as they are.

DISTINCT on Multiple Columns

When you use DISTINCT with multiple columns, the database treats the combination of all listed columns as the unit for determining uniqueness. Two rows are considered duplicates only if every column in the SELECT list matches.

SELECT DISTINCT column1, column2
FROM table_name;

This is an important distinction from single-column DISTINCT, so let us explore it carefully.

Understanding Multi-Column Uniqueness

Find the unique combinations of city and signup year:

SELECT DISTINCT city, signup_date
FROM customers
ORDER BY city;

Output:

citysignup_date
Chicago2023-06-10
Chicago2024-02-28
Los Angeles2023-03-22
New York2023-01-15
New York2023-08-05
Seattle2024-01-18
NULL2024-04-01

Notice that Chicago appears twice and New York appears twice. Unlike the single-column example where each city appeared only once, here they repeat because the combination of city and signup_date is different for each row. "Chicago + 2023-06-10" is not the same as "Chicago + 2024-02-28", so both are considered distinct.

Another Multi-Column Example

Find unique combinations of customer and order status:

SELECT DISTINCT customer_id, status
FROM orders
ORDER BY customer_id, status;

Output:

customer_idstatus
1completed
2completed
2pending
3shipped
4shipped
5pending
6pending

Customer 2 appears twice because they have orders with two different statuses (completed and pending). Each unique pair of customer_id and status gets its own row.

Comparing Single vs Multi-Column DISTINCT

Let us make the difference crystal clear with a side-by-side comparison:

Single-column DISTINCT on status:

SELECT DISTINCT status
FROM orders;
status
completed
shipped
pending

3 rows. Each unique status appears once.

Two-column DISTINCT on customer_id and status:

SELECT DISTINCT customer_id, status
FROM orders
ORDER BY customer_id;
customer_idstatus
1completed
2completed
2pending
3shipped
4shipped
5pending
6pending

7 rows. Each unique combination of customer and status appears once.

Key Insight

DISTINCT does not operate on individual columns in isolation. It operates on the entire row as defined by your SELECT list. Adding more columns to a DISTINCT query almost always increases the number of results because there are more possible unique combinations.

DISTINCT with ORDER BY

You can combine DISTINCT with ORDER BY to get sorted unique values:

SELECT DISTINCT city
FROM customers
WHERE city IS NOT NULL
ORDER BY city ASC;

Output:

city
Chicago
Los Angeles
New York
Seattle

A clean, alphabetically sorted list of customer cities with no duplicates and no NULLs. This is exactly what you would need to populate a city filter dropdown in a web application.

-- Unique product prices, sorted from highest to lowest
SELECT DISTINCT price
FROM products
ORDER BY price DESC;

Output:

price
129.99
110.00
89.99
65.00
45.00
42.50
38.00
34.99
29.99
24.99
ORDER BY Columns Must Be in the SELECT List (Usually)

When using DISTINCT, most databases require that any column in the ORDER BY clause also appears in the SELECT list. This makes sense because if you are eliminating duplicates, the database needs to know which value to use for sorting.

-- Error in most databases: sorting by a column not in the SELECT list
SELECT DISTINCT city
FROM customers
ORDER BY signup_date;
-- ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

-- Correct: sort by a column that is in the SELECT list
SELECT DISTINCT city
FROM customers
ORDER BY city;

DISTINCT with WHERE

DISTINCT works seamlessly with WHERE. The database first filters rows based on the WHERE condition, then removes duplicates from the remaining rows:

-- Unique statuses of orders placed in 2024
SELECT DISTINCT status
FROM orders
WHERE order_date >= '2024-03-01';

Output:

status
shipped
pending

Only shipped and pending orders were placed from March 2024 onward. No completed orders fall in that range.

-- Unique categories that have products priced over $50
SELECT DISTINCT category_id
FROM products
WHERE price > 50;

Output:

category_id
1
3
4

Categories 1 (Electronics), 3 (Home & Kitchen), and 4 (Sports) each have at least one product priced over $50. Category 2 (Books) does not.

DISTINCT with LIMIT

You can also combine DISTINCT with LIMIT to get a limited set of unique values:

-- The first 3 unique cities alphabetically
SELECT DISTINCT city
FROM customers
WHERE city IS NOT NULL
ORDER BY city ASC
LIMIT 3;

Output:

city
Chicago
Los Angeles
New York

The database finds all unique cities, sorts them alphabetically, and then returns only the first 3.

DISTINCT and NULL Values

DISTINCT treats all NULL values as equal to each other. This means that if a column contains multiple NULL values, DISTINCT will collapse them into a single NULL row.

-- Without DISTINCT: NULL might appear multiple times
SELECT city
FROM customers;

If multiple customers had NULL cities, you would see NULL listed multiple times. With DISTINCT:

SELECT DISTINCT city
FROM customers;

Output:

city
New York
Los Angeles
Chicago
Seattle
NULL

NULL appears exactly once, just like any other value. This behavior is consistent across all major databases.

NULL Equality in DISTINCT vs WHERE

This is an interesting quirk worth noting. In the WHERE clause, NULL = NULL evaluates to UNKNOWN (not true). But for the purposes of DISTINCT, two NULL values are considered equal and collapsed into one. This is an intentional design choice in the SQL standard. Without it, DISTINCT would never be able to remove duplicate NULL values, which would be impractical.

When to Use DISTINCT

DISTINCT is the right tool in several common scenarios.

Building Filter Options

When your application needs a list of unique values for a dropdown, checkbox filter, or autocomplete:

-- Populate a "Filter by City" dropdown
SELECT DISTINCT city
FROM customers
WHERE city IS NOT NULL
ORDER BY city;

-- Populate a "Filter by Status" dropdown
SELECT DISTINCT status
FROM orders
ORDER BY status;

-- Populate a "Filter by Category" dropdown
SELECT DISTINCT c.name
FROM categories c
JOIN products p ON c.id = p.category_id
ORDER BY c.name;

Answering "Which" or "What" Questions

When your question is about existence rather than quantity:

-- Which customers have placed orders?
SELECT DISTINCT customer_id
FROM orders;

-- What products have been ordered?
SELECT DISTINCT product_id
FROM order_items;

-- Which dates had order activity?
SELECT DISTINCT order_date
FROM orders
ORDER BY order_date;

Data Exploration and Quality Checks

When you want to understand what values exist in a column:

-- What are all the possible values in the status column?
SELECT DISTINCT status
FROM orders;

-- Are there any unexpected category IDs?
SELECT DISTINCT category_id
FROM products
ORDER BY category_id;

Removing Incidental Duplicates from Joins

When joining tables, you sometimes get duplicate rows because of one-to-many relationships:

-- Without DISTINCT: Customer names repeat for each of their orders
SELECT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.id = o.customer_id;

Output:

first_namelast_name
AliceJohnson
AliceJohnson
BobMartinez
BobMartinez
CarolSingh
DavidChen
EvaBrown
FrankWilson

Alice and Bob each appear twice because they each have 2 orders. If you only want the list of customers who have placed at least one order:

-- With DISTINCT: Each customer appears once
SELECT DISTINCT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.id = o.customer_id;

Output:

first_namelast_name
AliceJohnson
BobMartinez
CarolSingh
DavidChen
EvaBrown
FrankWilson

When NOT to Use DISTINCT

This is equally important. DISTINCT is sometimes used as a "band-aid" to hide problems that should be fixed at the query or schema level. Here are the situations where DISTINCT is a code smell.

Masking a Bad JOIN

If you add DISTINCT because a JOIN produces unexpected duplicates, the real fix is usually to correct the join logic, not to slap DISTINCT on top.

-- Bad: DISTINCT hides the fact that the JOIN is producing too many rows
SELECT DISTINCT p.name, p.price
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id;

The query above produces duplicates because a product can appear in multiple order items. DISTINCT removes them, but it also masks the underlying issue. Ask yourself: "Why is my join creating duplicates? Is my join condition correct? Am I joining to the right table?"

If you truly want a list of products that have been ordered, a clearer approach might be:

-- Better: EXISTS clearly expresses the intent
SELECT p.name, p.price
FROM products p
WHERE EXISTS (
SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);

When You Actually Need Counts or Aggregations

If you need to know how many unique values exist rather than what they are, use COUNT(DISTINCT ...) instead of SELECT DISTINCT:

-- Wrong approach: SELECT DISTINCT, then count rows in application code
SELECT DISTINCT city
FROM customers;
-- Then count the rows in your application... inefficient

-- Right approach: Let the database count for you
SELECT COUNT(DISTINCT city) AS unique_cities
FROM customers;

Output:

unique_cities
4

This returns a single number directly, without transferring all the unique values to your application.

When All Rows Are Already Unique

If your query includes a primary key or a UNIQUE column, adding DISTINCT is redundant. Every row is already guaranteed to be unique.

-- Redundant: The id column is a primary key, so every row is unique anyway
SELECT DISTINCT id, name, price
FROM products;

-- Cleaner: No DISTINCT needed
SELECT id, name, price
FROM products;

Both queries return identical results, but the first one forces the database to perform an unnecessary deduplication operation. On large tables, this wasted work can measurably slow down your query.

The DISTINCT Code Smell

If you find yourself adding DISTINCT to "fix" duplicate rows in a query that should not have duplicates, stop and investigate. Common root causes include:

  • Incorrect JOIN conditions that create a cartesian product or unintended many-to-many relationships
  • Missing WHERE clauses that should filter out unwanted rows
  • Joining to the wrong table or joining on the wrong columns
  • Schema issues where data should have been normalized into separate tables

DISTINCT should be a deliberate choice to get unique values, not a quick fix for broken query logic.

DISTINCT vs GROUP BY

Beginners often wonder about the difference between DISTINCT and GROUP BY, since both can return unique values. Let us clarify.

DISTINCT removes duplicate rows from the final result:

SELECT DISTINCT city
FROM customers;

Output:

city
New York
Los Angeles
Chicago
Seattle
NULL

GROUP BY groups rows together, typically for use with aggregate functions like COUNT, SUM, and AVG:

SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city;

Output:

citycustomer_count
New York2
Los Angeles1
Chicago2
Seattle1
NULL1

When used without aggregate functions, GROUP BY and DISTINCT produce the same results:

-- These two queries return identical results
SELECT DISTINCT city FROM customers;
SELECT city FROM customers GROUP BY city;

The difference is intent and capability:

FeatureDISTINCTGROUP BY
PurposeRemove duplicate rowsGroup rows for aggregation
AggregationCannot use aggregate functionsDesigned for aggregate functions
ReadabilityClearly states "I want unique values"Implies "I want to group and summarize"
PerformanceUsually identical to GROUP BYUsually identical to DISTINCT
Which Should You Use?
  • Use DISTINCT when you simply want unique values with no calculations
  • Use GROUP BY when you need to aggregate data (count, sum, average, etc.)

If you are not using aggregate functions, DISTINCT is the clearer choice because it explicitly communicates your intent: "I want unique rows."

Practical Exercises

Exercise 1

Find all unique categories that currently have available products.

SELECT DISTINCT category_id
FROM products
WHERE is_available = true
ORDER BY category_id;

Expected output:

category_id
1
2
3
4

Exercise 2

Find all unique combinations of order status and the month they were placed.

SELECT DISTINCT status, order_date
FROM orders
ORDER BY order_date;

Expected output:

statusorder_date
completed2024-01-10
completed2024-01-15
completed2024-02-20
shipped2024-03-05
shipped2024-03-12
pending2024-03-20
pending2024-04-01
pending2024-04-10

Exercise 3

Find which customers have placed orders (show unique customer IDs only).

SELECT DISTINCT customer_id
FROM orders
ORDER BY customer_id;

Expected output:

customer_id
1
2
3
4
5
6

Exercise 4

Count how many unique cities your customers are from (excluding unknown cities).

SELECT COUNT(DISTINCT city) AS unique_cities
FROM customers
WHERE city IS NOT NULL;

Expected output:

unique_cities
4

Exercise 5

Find the unique product prices in the Sports category (category_id = 4), sorted from lowest to highest.

SELECT DISTINCT price
FROM products
WHERE category_id = 4
ORDER BY price ASC;

Expected output:

price
24.99
38.00
110.00

Key Takeaways

DISTINCT is a focused, powerful tool for eliminating duplicate rows from your query results. Here is what you should remember:

  • SELECT DISTINCT removes duplicate rows from results, returning only unique values
  • With multiple columns, DISTINCT evaluates uniqueness based on the combination of all selected columns, not each column individually
  • DISTINCT treats all NULL values as equal, collapsing multiple NULLs into a single row
  • Use DISTINCT for building filter dropdowns, answering "which/what" questions, exploring data, and removing incidental duplicates from joins
  • Avoid DISTINCT as a band-aid for broken joins, when a primary key already guarantees uniqueness, or when you actually need aggregation (COUNT, SUM)
  • If you need to count unique values, use COUNT(DISTINCT column) instead of selecting distinct values and counting in your application
  • DISTINCT and GROUP BY produce the same results when no aggregate functions are involved, but DISTINCT better communicates the intent of "give me unique rows"
  • If adding DISTINCT "fixes" unexpected duplicates, investigate the root cause before accepting it as the solution

DISTINCT rounds out your foundational query toolkit. Combined with SELECT, FROM, WHERE, logical operators, ORDER BY, and LIMIT, you now have everything you need to write precise, clean, and efficient queries that retrieve exactly the data you want, filtered, sorted, deduplicated, and sized to your needs.