Skip to main content

SQL GROUP BY for Grouping Results

In the previous guide, you learned how aggregate functions condense an entire table into a single summary row. COUNT(*) tells you how many total products you have. AVG(price) gives you one average across all products. But what if you need the average price per category? Or the number of orders per customer? Or the total revenue per month?

These questions require the database to split the data into groups first, then aggregate each group separately. This is exactly what GROUP BY does. It is one of the most powerful clauses in SQL and the foundation of every meaningful report, dashboard, and analytical query.

This guide covers how GROUP BY works conceptually, how to group by one or multiple columns, the critical error that every beginner encounters (selecting non-aggregated columns), and practical real-world examples. Everything uses the ShopSmart sample database with full outputs (we defined it in a previous guide here).

How GROUP BY Works Conceptually

Without GROUP BY, aggregate functions process all matching rows as one big group and return a single result:

-- One group, one result: the average price across ALL products
SELECT AVG(price) AS avg_price
FROM products;
avg_price
61.045

With GROUP BY, the database first divides the rows into groups based on the values in one or more columns, then applies the aggregate function to each group separately, returning one result row per group:

-- Multiple groups, one result per group: average price PER category
SELECT category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_id;
category_idavg_price
157.495
238.745000000000005
3129.99
457.663333333333334

Instead of one average for all products, you now have a separate average for each category.

The Mental Model

Think of GROUP BY as a three-step process:

Step 1: Sort rows into groups based on the GROUP BY column:

Group (category_id = 1)Group (category_id = 2)Group (category_id = 3)Group (category_id = 4)
Wireless Mouse ($29.99)SQL for Beginners ($34.99)Coffee Maker Pro ($129.99)Yoga Mat Premium ($38.00)
Mechanical Keyboard ($89.99)Data Science Handbook ($42.50)Running Shoes X1 ($110.00)
USB-C Hub ($45.00)Stainless Water Bottle ($24.99)
Bluetooth Speaker ($65.00)

Step 2: Apply the aggregate function to each group independently:

  • Category 1: AVG(29.99, 89.99, 45.00, 65.00) = 57.50
  • Category 2: AVG(34.99, 42.50) = 38.75
  • Category 3: AVG(129.99) = 129.99
  • Category 4: AVG(38.00, 110.00, 24.99) = 57.66

Step 3: Return one row per group with the group identifier and the aggregated value.

The Key Insight

GROUP BY does not change which rows are included. It changes how the rows are organized before aggregation. Without it, all rows form one group. With it, rows are split into multiple groups, and each group gets its own summary.

Grouping by One Column

The most common pattern is grouping by a single column to get summaries per category, per status, per customer, or per any other dimension.

Products Per Category

SELECT category_id, COUNT(*) AS product_count
FROM products
GROUP BY category_id
ORDER BY product_count DESC;

Output:

category_idproduct_count
14
43
22
31

Electronics (category 1) has the most products. Home & Kitchen (category 3) has only one.

Orders Per Status

SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;

Output:

statusorder_count
completed3
pending3
shipped2

Revenue Per Status

SELECT status,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue,
ROUND(AVG(total_amount), 2) AS avg_order_value
FROM orders
GROUP BY status
ORDER BY total_revenue DESC;

Output:

statusorder_counttotal_revenueavg_order_value
pending3297.9699999999999799.32
completed3287.4695.82
shipped2284.99142.50

Each status group gets its own count, total, and average. This single query could power an order management dashboard.

Orders Per Customer

SELECT customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC;

Output:

customer_idorder_counttotal_spent
61199.98
12197.47
41155.00
22152.99
31129.99
5134.99

Customer 1 (Alice) has spent the most overall with 2 orders totaling $197.47. Customer 6 (Frank) has the highest single order at $199.98.

Customers Per City

SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;

Output:

citycustomer_count
New York2
Chicago2
Los Angeles1
Seattle1

Notice the WHERE city IS NOT NULL filter. Without it, NULL would form its own group. Whether that is desirable depends on your use case.

What Happens to NULL in GROUP BY?

Unlike most SQL operations where NULL creates confusion, GROUP BY treats all NULL values as belonging to the same group:

-- NULL forms its own group
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
ORDER BY customer_count DESC;

Output:

citycustomer_count
New York2
Chicago2
Los Angeles1
Seattle1
NULL1

Grace's NULL city creates a group of its own. This is consistent with how DISTINCT treats NULL values.

Grouping by Multiple Columns

You can group by two or more columns to create more specific groupings. The database creates a separate group for each unique combination of values across all grouped columns.

Orders Per Customer Per Status

SELECT customer_id, status, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id, status
ORDER BY customer_id, status;

Output:

customer_idstatusorder_count
1completed2
2completed1
2pending1
3shipped1
4shipped1
5pending1
6pending1

Customer 2 (Bob) appears in two rows because he has orders in two different statuses. Each unique combination of customer_id and status gets its own group.

Compare this with the single-column grouping:

-- Single-column: 6 groups (one per customer)
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
-- Returns 6 rows

-- Multi-column: 7 groups (one per customer-status combination)
SELECT customer_id, status, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id, status;
-- Returns 7 rows (Bob has 2 rows now)

Products Per Category and Availability

SELECT category_id,
is_available,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category_id, is_available
ORDER BY category_id, is_available DESC;

Output:

category_idis_availableproduct_countavg_price
1true354.99
1false165.00
2true238.75
3true1129.99
4true357.66

Category 1 (Electronics) splits into two groups: 3 available products averaging $54.99 and 1 unavailable product at $65.00 (the Bluetooth Speaker).

Orders Per Month and Status

-- Extract year-month from order dates for monthly grouping
-- PostgreSQL syntax
SELECT TO_CHAR(order_date, 'YYYY-MM') AS order_month,
status,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
GROUP BY TO_CHAR(order_date, 'YYYY-MM'), status
ORDER BY order_month, status;

Output:

order_monthstatusorder_countrevenue
2024-01completed2209.97
2024-02completed177.49
2024-03pending134.99
2024-03shipped2284.99
2024-04pending2262.98
Date Grouping Syntax Varies by Database

The function to extract a formatted date string differs across databases:

-- PostgreSQL
TO_CHAR(order_date, 'YYYY-MM')

-- MySQL
DATE_FORMAT(order_date, '%Y-%m')

-- SQLite
STRFTIME('%Y-%m', order_date)

-- SQL Server
FORMAT(order_date, 'yyyy-MM')

The concept is the same in every database. Only the function name and format string differ.

Adding Context with JOINs

Grouping by category_id or customer_id produces correct numbers, but raw IDs are not very readable. Joining with related tables lets you display names alongside the aggregated data.

Product Counts with Category Names

SELECT c.name AS category,
COUNT(p.id) AS product_count,
ROUND(AVG(p.price), 2) AS avg_price,
MIN(p.price) AS cheapest,
MAX(p.price) AS most_expensive
FROM categories c
LEFT JOIN products p ON c.id = p.category_id
GROUP BY c.name
ORDER BY product_count DESC;

Output:

categoryproduct_countavg_pricecheapestmost_expensive
Electronics457.5029.9989.99
Sports357.6624.99110.00
Books238.7534.9942.50
Home & Kitchen1129.99129.99129.99

Now the report shows "Electronics" instead of "1", which is far more useful for anyone reading it.

Customer Order Summary with Names

SELECT c.first_name || ' ' || c.last_name AS customer,
c.city,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total_amount), 0) AS total_spent,
COALESCE(ROUND(AVG(o.total_amount), 2), 0) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.first_name, c.last_name, c.city
ORDER BY total_spent DESC;

Output:

customercityorder_counttotal_spentavg_order_value
Frank WilsonChicago1199.98199.98
Alice JohnsonNew York2197.4798.73
David ChenNew York1155.00155.00
Bob MartinezLos Angeles2152.9976.50
Carol SinghChicago1129.99129.99
Eva BrownSeattle134.9934.99
Grace TaylorNULL000

The LEFT JOIN ensures Grace appears even though she has no orders. COALESCE converts her NULL totals to 0.

Why GROUP BY c.id?

When grouping by customer, include the primary key (c.id) in the GROUP BY clause. This guarantees each customer gets their own group, even if two customers share the same name. The additional columns (first_name, last_name, city) are functionally dependent on c.id, so including them is safe and necessary for display.

The Most Common GROUP BY Error

This is the single most important section of this guide. Almost every SQL beginner encounters this error, and understanding why it happens will permanently improve your SQL writing.

The Error

-- This query FAILS in most databases
SELECT name, category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_id;

Error message (PostgreSQL):

ERROR: column "products.name" must appear in the GROUP BY clause
or be used in an aggregate function

Error message (SQL Server):

Column 'products.name' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.

Why It Fails

Let us trace through the logic. When you write GROUP BY category_id, the database creates one group per category:

Group: category_id = 1Group: category_id = 2
Wireless MouseSQL for Beginners
Mechanical KeyboardData Science Handbook
USB-C Hub
Bluetooth Speaker

For category 1, AVG(price) produces one number: 57.50. But name has four different values in this group. Which name should the database display? "Wireless Mouse"? "Mechanical Keyboard"? There is no logical answer. The database refuses to guess and raises an error instead.

The Rule

Every column in your SELECT list must either:

  1. Appear in the GROUP BY clause, or
  2. Be used inside an aggregate function (COUNT, SUM, AVG, MIN, MAX)

No exceptions (with one caveat for MySQL, discussed below).

How to Fix It

There are several valid approaches depending on what you actually need:

Fix 1: Remove the non-aggregated column

-- If you just want the average price per category, you don't need the name
SELECT category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_id;

Fix 2: Add the column to GROUP BY

-- This creates a group for each unique (category_id, name) combination
-- Probably NOT what you want, as each product becomes its own group
SELECT name, category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_id, name;

Each product is now its own group, and the "average" is just each product's own price. This is technically valid but defeats the purpose of aggregation.

Fix 3: Aggregate the column

-- Use MIN or MAX on the name column if you want a representative name
SELECT MIN(name) AS first_product_name,
category_id,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category_id;

Fix 4: Use a JOIN or subquery to bring in non-grouped data

This is the most robust approach when you need detailed information alongside grouped summaries, but it requires more advanced techniques covered in future guides.

Multiple Wrong Examples and Their Fixes

-- Wrong: 'email' is not grouped or aggregated
SELECT city, email, COUNT(*) AS customer_count
FROM customers
GROUP BY city;
-- ERROR

-- Fix: Remove email or aggregate it
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city;
-- Wrong: 'order_date' is not grouped or aggregated
SELECT customer_id, order_date, SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id;
-- ERROR

-- Fix option 1: Add order_date to GROUP BY (groups by customer AND date)
SELECT customer_id, order_date, SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id, order_date;

-- Fix option 2: Aggregate order_date
SELECT customer_id,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order,
SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id;
MySQL's Dangerous Exception

MySQL with the default sql_mode setting may not raise an error when you select non-aggregated columns without grouping them. Instead, it silently returns an arbitrary value from the group.

-- MySQL (default settings): No error, but WRONG results
SELECT name, category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_id;
-- Returns one row per category, but the 'name' value is random/unpredictable

This is not a feature; it is a trap. The name value you see has no logical relationship to the average. Enable ONLY_FULL_GROUP_BY mode in MySQL to enforce the standard behavior:

SET sql_mode = 'ONLY_FULL_GROUP_BY';

With this mode enabled, MySQL behaves like PostgreSQL and SQL Server, rejecting queries with non-aggregated columns not in GROUP BY.

GROUP BY with WHERE

WHERE filters rows before grouping. The database removes rows that do not match the WHERE condition, then groups the remaining rows.

-- Count available products per category
SELECT category_id, COUNT(*) AS available_count
FROM products
WHERE is_available = true
GROUP BY category_id
ORDER BY available_count DESC;

Output:

category_idavailable_count
43
13
22
31

Compare this with the unfiltered count:

-- Count ALL products per category (including unavailable)
SELECT category_id, COUNT(*) AS total_count
FROM products
GROUP BY category_id
ORDER BY total_count DESC;
category_idtotal_count
14
43
22
31

Category 1 drops from 4 to 3 because the Bluetooth Speaker (unavailable) is excluded by WHERE.

Revenue from Completed Orders Per Customer

SELECT customer_id,
COUNT(*) AS completed_orders,
SUM(total_amount) AS completed_revenue
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY completed_revenue DESC;

Output:

customer_idcompleted_orderscompleted_revenue
12197.47
2189.99

Only Alice and Bob have completed orders.

The Processing Order

Understanding the order in which SQL processes clauses is critical when using GROUP BY:

StepClauseWhat Happens
1FROMIdentify the source table(s)
2WHEREFilter individual rows
3GROUP BYDivide remaining rows into groups
4HAVINGFilter groups (covered in the next guide)
5SELECTCalculate expressions, apply aliases
6ORDER BYSort the final results
7LIMITRestrict the number of rows returned

This order explains several important behaviors:

  • WHERE cannot use aggregate functions (aggregation has not happened yet at step 2)
  • WHERE cannot use column aliases (aliases are not created until step 5)
  • ORDER BY can use aliases (it runs after SELECT)
  • GROUP BY cannot use column aliases in most databases (it runs before SELECT)

GROUP BY with ORDER BY

You can sort grouped results just like any other query. Sorting by the aggregate result is particularly useful:

Categories Ranked by Average Price

SELECT category_id,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category_id
ORDER BY avg_price DESC;

Output:

category_idavg_price
3129.99
457.66
157.49
238.75

Customers Ranked by Total Spending

SELECT customer_id,
SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC;

Output:

customer_idtotal_spent
6199.98
1197.47
4155.00
2152.99
3129.99
534.99

Top 3 Categories by Product Count

Combining GROUP BY, ORDER BY, and LIMIT:

SELECT category_id,
COUNT(*) AS product_count
FROM products
GROUP BY category_id
ORDER BY product_count DESC
LIMIT 3;

Output:

category_idproduct_count
14
43
22

Real-World GROUP BY Scenarios

Sales Report by Month

-- PostgreSQL syntax
SELECT TO_CHAR(order_date, 'YYYY-MM') AS month,
COUNT(*) AS orders,
SUM(total_amount) AS revenue,
ROUND(AVG(total_amount), 2) AS avg_order
FROM orders
GROUP BY TO_CHAR(order_date, 'YYYY-MM')
ORDER BY month;

Output:

monthordersrevenueavg_order
2024-012209.97104.99
2024-02177.4977.49
2024-033319.98106.66
2024-042262.98131.49

Inventory Summary by Category

SELECT c.name AS category,
COUNT(p.id) AS products,
SUM(p.stock_quantity) AS total_stock,
ROUND(SUM(p.price * p.stock_quantity), 2) AS inventory_value
FROM categories c
JOIN products p ON c.id = p.category_id
GROUP BY c.name
ORDER BY inventory_value DESC;

Output:

categoryproductstotal_stockinventory_value
Electronics442520247.75
Sports334014898.20
Home & Kitchen1253249.75
Books2803024.50

Price Distribution Analysis

SELECT CASE
WHEN price < 30 THEN 'Under $30'
WHEN price < 50 THEN '$30 - $49.99'
WHEN price < 100 THEN '$50 - $99.99'
ELSE '$100 and above'
END AS price_range,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY CASE
WHEN price < 30 THEN 'Under $30'
WHEN price < 50 THEN '$30 - $49.99'
WHEN price < 100 THEN '$50 - $99.99'
ELSE '$100 and above'
END
ORDER BY avg_price;

Output:

price_rangeproduct_countavg_price
Under $30227.49
$30 - $49.99440.12
$50 - $99.99277.50
$100 and above2120.00
Grouping by Expressions

Notice that the GROUP BY clause contains the same CASE expression as the SELECT list. When you group by an expression, you must repeat the full expression in GROUP BY. Some databases (MySQL, PostgreSQL) allow you to reference the alias instead, but repeating the expression is the most portable approach.

-- PostgreSQL / MySQL: Can use alias in GROUP BY
GROUP BY price_range

-- ANSI standard / SQL Server: Must repeat the expression
GROUP BY CASE WHEN price < 30 THEN 'Under $30' ... END

Order Activity by Day of Week

-- PostgreSQL syntax
SELECT TO_CHAR(order_date, 'Day') AS day_of_week,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
GROUP BY TO_CHAR(order_date, 'Day'), EXTRACT(DOW FROM order_date)
ORDER BY EXTRACT(DOW FROM order_date);

This kind of query helps businesses understand their busiest days.

Common GROUP BY Mistakes

Mistake 1: Forgetting GROUP BY with Aggregates

-- Wrong: Aggregate mixed with non-aggregated column, no GROUP BY
SELECT category_id, COUNT(*)
FROM products;
-- ERROR: column "products.category_id" must appear in the GROUP BY clause

Fix:

SELECT category_id, COUNT(*)
FROM products
GROUP BY category_id;

Mistake 2: Using WHERE Instead of HAVING to Filter Groups

-- Wrong: WHERE cannot filter on aggregate results
SELECT category_id, COUNT(*) AS product_count
FROM products
WHERE COUNT(*) > 2
GROUP BY category_id;
-- ERROR: aggregate functions are not allowed in WHERE

WHERE filters individual rows before grouping. To filter after grouping based on aggregate results, you need HAVING (covered in the next guide). For now, here is a preview:

-- Correct: HAVING filters groups after aggregation
SELECT category_id, COUNT(*) AS product_count
FROM products
GROUP BY category_id
HAVING COUNT(*) > 2;

Mistake 3: Selecting Too Many Columns

-- Wrong: price, name, and stock_quantity are not in GROUP BY
SELECT category_id, name, price, stock_quantity, COUNT(*)
FROM products
GROUP BY category_id;
-- ERROR

Fix: Only select what you can aggregate or group:

SELECT category_id,
COUNT(*) AS product_count,
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
SUM(stock_quantity) AS total_stock
FROM products
GROUP BY category_id;

Output:

category_idproduct_countcheapestmost_expensivetotal_stock
1429.9989.99425
2234.9942.5080
31129.99129.9925
4324.99110.00340

Mistake 4: Grouping Defeats the Purpose

-- Technically valid, but pointless: every row is its own group
SELECT id, name, price, COUNT(*) AS count
FROM products
GROUP BY id, name, price;

If you group by the primary key (or all columns), every row becomes its own group and COUNT(*) is always 1. The grouping adds nothing. Ask yourself: "Am I actually summarizing anything?"

Practical Exercises

Exercise 1

Count the number of customers in each city (exclude customers with unknown cities).

SELECT city, COUNT(*) AS customer_count
FROM customers
WHERE city IS NOT NULL
GROUP BY city
ORDER BY customer_count DESC;

Expected output:

citycustomer_count
New York2
Chicago2
Los Angeles1
Seattle1

Exercise 2

Find the total revenue and average order value for each order status.

SELECT status,
SUM(total_amount) AS total_revenue,
ROUND(AVG(total_amount), 2) AS avg_order_value
FROM orders
GROUP BY status
ORDER BY total_revenue DESC;

Expected output:

statustotal_revenueavg_order_value
pending297.9699999999999799.32
completed287.4695.82
shipped2​84.99142.50

Exercise 3

Find the cheapest and most expensive product in each category, along with the category name.

SELECT c.name AS category,
MIN(p.price) AS cheapest,
MAX(p.price) AS most_expensive,
MAX(p.price) - MIN(p.price) AS price_range
FROM products p
JOIN categories c ON p.category_id = c.id
GROUP BY c.name
ORDER BY price_range DESC;

Expected output:

categorycheapestmost_expensiveprice_range
Sports24.99110.0085.01
Electronics29.9989.9960.00
Books34.9942.507.509999999999998
Home & Kitchen129.99129.990.00

Exercise 4

Count how many orders each customer has placed, showing the customer name. Include customers with no orders.

SELECT c.first_name || ' ' || c.last_name AS customer,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.first_name, c.last_name
ORDER BY order_count DESC, customer;

Expected output:

customerorder_count
Alice Johnson2
Bob Martinez2
Carol Singh1
David Chen1
Eva Brown1
Frank Wilson1
Grace Taylor0

Exercise 5

Find the total quantity of items ordered for each product, and show only products that have been ordered.

SELECT p.name AS product,
SUM(oi.quantity) AS total_ordered,
SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.id, p.name
ORDER BY total_revenue DESC;

Expected output:

producttotal_orderedtotal_revenue
Running Shoes X12220.00
Mechanical Keyboard2179.98
Coffee Maker Pro1129.99
USB-C Hub290.00
SQL for Beginners269.98
Wireless Mouse259.98
Stainless Water Bottle249.98
Yoga Mat Premium138.00

Key Takeaways

GROUP BY transforms aggregate functions from whole-table summaries into per-group summaries, enabling the analytical queries that power reports, dashboards, and business intelligence. Here is what you should remember:

  • GROUP BY divides rows into groups based on one or more columns, then applies aggregate functions to each group separately
  • Each unique value (or combination of values) in the GROUP BY columns creates one group and produces one output row
  • Every column in your SELECT list must either appear in GROUP BY or be inside an aggregate function. No exceptions in standard SQL.
  • GROUP BY with multiple columns creates groups based on each unique combination of values
  • WHERE filters rows before grouping. Use it to exclude individual rows
  • HAVING (next guide) filters groups after aggregation. Use it to exclude groups based on aggregate results
  • The processing order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
  • NULL values form their own group in GROUP BY
  • Combine GROUP BY with JOIN to display meaningful names instead of raw IDs
  • Combine with ORDER BY to rank groups by their aggregate values
  • Combine with LIMIT to get "top N" groups
  • MySQL may silently allow non-aggregated columns without GROUP BY, returning arbitrary values. Enable ONLY_FULL_GROUP_BY mode to prevent this

GROUP BY is where SQL transitions from simple data retrieval into genuine data analysis. Combined with aggregate functions, it lets you answer complex business questions in a single query. The next guide covers HAVING, which adds the ability to filter groups after aggregation, completing your analytical toolkit.