Skip to main content

SQL Query Optimization Best Practices

Writing a query that returns correct results is only half the job. Making that query fast is the other half, and at scale, it is the half that matters most. A query that takes 200ms on your development machine with 500 rows can take 45 seconds in production with 10 million rows. The difference between a well-optimized query and a naive one is often not 2x or 5x but 100x or more.

SQL query optimization is not about memorizing tricks. It is about understanding how databases process your queries and writing SQL that works with the database engine rather than against it. This guide covers the most impactful optimization practices: avoiding SELECT *, choosing the right data types, writing SARGable conditions, optimizing joins, eliminating N+1 patterns, and implementing efficient pagination. Each practice is explained with concrete examples showing both the wrong way and the right way, so you can apply these improvements immediately.

The Sample Data

We will reference these tables throughout the guide. Imagine them at production scale.

customers table (~500,000 rows):

idemailfirst_namelast_namecountrysignup_dateis_activeprofile_json
1alice@example.comAliceMorganUS2022-03-15true{"theme":"dark",...}
2bob@mail.coBobChenUK2022-06-01true{"theme":"light",...}
........................

orders table (~5,000,000 rows):

idcustomer_idorder_dateamountstatusnotes
112024-01-05250.00completedRush delivery requested
222024-01-12430.00completedNULL
..................

order_items table (~15,000,000 rows):

idorder_idproduct_idquantityunit_price
11101279.99
21205145.00
...............

products table (~10,000 rows):

idskunamecategorypricedescription
101WDG-001Widget ProElectronics79.99A premium widget with...
..................

Avoid SELECT *

SELECT * is the most common performance anti-pattern in SQL. It is convenient during development, but in production it creates real costs that compound at scale.

Why SELECT * Is Harmful

1. Transfers Unnecessary Data

-- BAD: Fetches ALL columns, including a large TEXT/JSON field
SELECT * FROM customers WHERE country = 'US';

If customers has 20 columns including profile_json (which averages 2 KB per row), and the query matches 150,000 rows, you are transferring:

  • SELECT *: 150,000 rows x ~2.5 KB = ~375 MB
  • SELECT id, first_name, email: 150,000 rows x ~100 bytes = ~15 MB

That is 25x more data transferred over the network, parsed by the application, and held in memory.

2. Prevents Index-Only Scans

When you request all columns, the database must visit the actual table even if an index covers the columns you actually need:

-- BAD: Forces table access even though an index on (country, email) exists
EXPLAIN ANALYZE
SELECT * FROM customers WHERE country = 'US';
Index Scan using idx_customers_country on customers
(actual time=0.050..285.300 rows=150000 loops=1)
-- GOOD: Can use an index-only scan
EXPLAIN ANALYZE
SELECT email FROM customers WHERE country = 'US';
Index Only Scan using idx_customers_country_email on customers
(actual time=0.030..95.200 rows=150000 loops=1)
Heap Fetches: 0

The optimized version is 3x faster because it never touches the table.

3. Breaks When Schema Changes

If someone adds a column to the table, SELECT * silently returns additional data your application may not expect. Explicit column lists make your code resilient to schema evolution.

The Fix: Always Name Your Columns

-- BAD
SELECT * FROM customers WHERE country = 'US';

-- GOOD: Only request what you need
SELECT id, first_name, last_name, email
FROM customers
WHERE country = 'US';
tip

The only acceptable place for SELECT * is in ad-hoc exploration during development or inside an EXISTS subquery where the columns are never actually read:

-- This is fine: EXISTS only checks for row existence, columns are irrelevant
SELECT c.name
FROM customers c
WHERE EXISTS (SELECT * FROM orders o WHERE o.customer_id = c.id);

Use Appropriate Data Types

Choosing the wrong data type for a column silently degrades performance across every query that touches it. Data types affect storage size, comparison speed, index efficiency, and memory usage.

Common Data Type Mistakes

Storing Numbers as Strings

-- BAD: phone_number stored as VARCHAR
CREATE TABLE customers (
id INT PRIMARY KEY,
phone_number VARCHAR(20), -- Stores "5551234567"
zip_code VARCHAR(10) -- Stores "90210"
);

While this might seem logical (phone numbers can have leading zeros, zip codes have dashes), the real problem is when people store numeric IDs or amounts as strings:

-- BAD: Storing numeric data as strings
CREATE TABLE orders (
id VARCHAR(50), -- Should be INT or BIGINT
amount VARCHAR(20), -- Should be DECIMAL/NUMERIC
quantity VARCHAR(10) -- Should be INT
);

Why this hurts:

  • Larger storage: VARCHAR('12345') takes 5+ bytes. INT(12345) takes 4 bytes.
  • Slower comparisons: String comparison is character-by-character. Numeric comparison is a single CPU operation.
  • Wrong sort order: '9' > '10' in string sorting (because '9' > '1' character-by-character).
  • Wider indexes: Indexes on wider columns are less efficient (fewer keys per B-tree page).
-- String sorting is WRONG for numbers
SELECT amount FROM orders ORDER BY amount;
-- Returns: '100', '1000', '200', '50', '500' (alphabetical, not numerical!)

-- Numeric sorting is CORRECT
SELECT amount FROM orders ORDER BY amount;
-- Returns: 50, 100, 200, 500, 1000

Oversized String Columns

-- BAD: VARCHAR(4000) for a country code that is always 2 characters
CREATE TABLE customers (
country VARCHAR(4000)
);

-- GOOD: Sized appropriately
CREATE TABLE customers (
country CHAR(2) -- Fixed 2-character ISO country code
);

While modern databases often store only the actual characters used regardless of the VARCHAR limit, the declared size can affect memory allocation during sorts and joins, and statistics estimates that influence query planning.

Using TEXT/BLOB When VARCHAR Suffices

-- BAD: TEXT for a field that never exceeds 255 characters
CREATE TABLE products (
name TEXT -- Overkill, and some databases handle TEXT differently
);

-- GOOD
CREATE TABLE products (
name VARCHAR(255) -- Appropriately sized
);

In some databases, TEXT columns are stored out-of-line (in a separate TOAST table in PostgreSQL, or off-page in SQL Server), adding overhead to every read.

Data Type Quick Reference

DataBad ChoiceGood ChoiceWhy
Currency amountsVARCHAR, FLOATDECIMAL(10,2) / NUMERICExact precision, no rounding errors
DatesVARCHARDATE, TIMESTAMPEnables date math, range queries, proper sorting
Boolean flagsVARCHAR('yes'/'no'), INTBOOLEAN1 byte, clear semantics
Short codesVARCHAR(4000)CHAR(2) or VARCHAR(10)Smaller storage, tighter indexes
Auto-increment IDsVARCHAR, BIGINT (when INT suffices)INT or BIGINT (based on scale)4 bytes vs 8 bytes vs variable string
UUIDsVARCHAR(36)UUID (native type)16 bytes vs 36 bytes, faster comparison
info

FLOAT and DOUBLE are approximate types that can introduce rounding errors. Never use them for financial data:

-- FLOAT rounding error
SELECT CAST(0.1 AS FLOAT) + CAST(0.2 AS FLOAT);
-- Returns: 0.30000000000000004 (not exactly 0.3!)

-- DECIMAL is exact
SELECT CAST(0.1 AS DECIMAL(10,2)) + CAST(0.2 AS DECIMAL(10,2));
-- Returns: 0.30

SARGability: Minimize Functions on Indexed Columns

SARGable stands for Search ARGument able. A condition is SARGable if the database can use an index to evaluate it. When you apply a function to an indexed column in a WHERE clause, you make the condition non-SARGable, forcing a full table scan even when a perfectly good index exists.

The Problem

-- Index exists on order_date
CREATE INDEX idx_orders_date ON orders(order_date);
-- NON-SARGABLE: Function wraps the indexed column
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
Seq Scan on orders  (cost=0.00..125000.00 rows=1650000 width=48)
Filter: (date_part('year', order_date) = 2024)
Rows Removed by Filter: 3350000

The database evaluates YEAR() on every single row because the index stores raw order_date values, not the year extracted from them. The index is completely useless here.

The Fix: Rewrite as Range Conditions

-- SARGABLE: The indexed column stands alone on one side
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';
Index Scan using idx_orders_date on orders  (cost=0.43..45200.00 rows=1650000 width=48)
Index Cond: (order_date >= '2024-01-01' AND order_date < '2025-01-01')

Same results, but now the index is used. The database navigates the B-tree to 2024-01-01 and scans forward to 2025-01-01.

Common Non-SARGable Patterns and Their Fixes

Functions on Columns

-- ❌ NON-SARGABLE
SELECT * FROM customers WHERE LOWER(email) = 'alice@example.com';
SELECT * FROM orders WHERE DATE(order_date) = '2024-03-15';
SELECT * FROM orders WHERE ROUND(amount) = 250;
SELECT * FROM customers WHERE LENGTH(last_name) > 10;

-- ✅ SARGABLE alternatives
SELECT * FROM customers WHERE email = 'alice@example.com'; -- Store consistently cased
SELECT * FROM orders WHERE order_date >= '2024-03-15' AND order_date < '2024-03-16';
SELECT * FROM orders WHERE amount >= 249.50 AND amount < 250.50;
SELECT * FROM customers WHERE last_name > 'AAAAAAAAAA'; -- Approximation, or use expression index

Arithmetic on Columns

-- ❌ NON-SARGABLE: math on the indexed column
SELECT * FROM orders WHERE amount * 1.1 > 500;
SELECT * FROM products WHERE price - discount > 100;

-- ✅ SARGABLE: move the math to the other side
SELECT * FROM orders WHERE amount > 500 / 1.1; -- amount > 454.54
SELECT * FROM products WHERE price > 100 + discount; -- Depends on whether discount is constant

Implicit Type Conversions

-- ❌ NON-SARGABLE: column is VARCHAR, compared to INT
-- Database must convert every row's phone_number to INT for comparison
SELECT * FROM customers WHERE phone_number = 5551234;

-- ✅ SARGABLE: matching types
SELECT * FROM customers WHERE phone_number = '5551234';

Wildcard at the Start of LIKE

-- ❌ NON-SARGABLE: leading wildcard
SELECT * FROM customers WHERE email LIKE '%@gmail.com';

-- ✅ SARGABLE: trailing wildcard only
SELECT * FROM customers WHERE email LIKE 'alice%';

-- ✅ Alternative for leading wildcard: store reversed value and use trailing wildcard
-- Or use a full-text index

Negation Conditions

-- ❌ Often NON-SARGABLE: negation typically requires scanning all rows
SELECT * FROM orders WHERE status != 'completed';
SELECT * FROM orders WHERE customer_id NOT IN (1, 2, 3);

-- ✅ SARGABLE alternative: rewrite as positive condition when possible
SELECT * FROM orders WHERE status IN ('pending', 'cancelled', 'processing');
Expression Indexes as a Last Resort

When you must use a function (e.g., case-insensitive search), create an expression index instead of rewriting the query:

-- PostgreSQL / SQLite
CREATE INDEX idx_customers_lower_email ON customers(LOWER(email));

-- Now this is SARGable
SELECT * FROM customers WHERE LOWER(email) = 'alice@example.com';
-- MySQL 8+
CREATE INDEX idx_customers_lower_email ON customers((LOWER(email)));

This is a perfectly valid approach when you cannot control the casing of stored data.

Optimize JOINs

Joins are where SQL query optimization matters most because they combine data from multiple tables, and the number of potential row comparisons can be enormous. A poorly optimized join between two million-row tables can bring a database to its knees.

Index Your Join Keys

This is the single most impactful join optimization. Without an index on the join column, the database must scan the entire table for every match.

-- The query
SELECT c.name, o.order_date, o.amount
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'US';
-- Essential index: the foreign key column used in the join
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Without the index:

Hash Join  (cost=15000..285000 rows=750000 width=28)
-> Seq Scan on orders (cost=0..125000 rows=5000000 width=16) -- 5M row scan!
-> Hash
-> Seq Scan on customers (cost=0..12500 rows=150000 width=12)
Execution Time: 4250 ms

With the index:

Nested Loop  (cost=0.86..185000 rows=750000 width=28)
-> Seq Scan on customers (cost=0..12500 rows=150000 width=12)
Filter: (country = 'US')
-> Index Scan using idx_orders_customer_id on orders (cost=0.43..1.15 rows=5 width=16)
Index Cond: (customer_id = c.id)
Execution Time: 890 ms

From 4250ms to 890ms, nearly a 5x improvement from a single index.

caution

Always index foreign key columns. Most databases do not automatically create indexes on foreign keys (PostgreSQL and MySQL included, though MySQL's InnoDB does auto-index foreign keys). This is one of the most overlooked performance problems.

-- Check for unindexed foreign key columns (PostgreSQL)
SELECT
c.conname AS constraint_name,
c.conrelid::regclass AS table_name,
a.attname AS column_name
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey) AND a.attrelid = c.conrelid
WHERE c.contype = 'f'
AND NOT EXISTS (
SELECT 1 FROM pg_index i
WHERE i.indrelid = c.conrelid
AND a.attnum = ANY(i.indkey)
);

Join Only What You Need

Do not join tables whose data you do not use in the result:

-- BAD: Joins products table but never uses any product columns
SELECT o.id, o.order_date, o.amount
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.customer_id = 42;

-- GOOD: Remove the unnecessary join
SELECT o.id, o.order_date, o.amount
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.customer_id = 42;

-- EVEN BETTER: If you don't need order_items data either
SELECT id, order_date, amount
FROM orders
WHERE customer_id = 42;

Each unnecessary join multiplies the work the database must do. If a JOIN does not contribute to the SELECT, WHERE, or GROUP BY, remove it.

Use EXISTS Instead of JOIN for Existence Checks

When you only need to know whether related rows exist (not what they contain), EXISTS is often faster than JOIN because it can stop scanning after finding the first match:

-- LESS OPTIMAL: JOIN returns all matching orders, then DISTINCT removes duplicates
SELECT DISTINCT c.id, c.name
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed';

-- BETTER: EXISTS stops at the first match per customer
SELECT c.id, c.name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
AND o.status = 'completed'
);

For a customer with 500 completed orders, the JOIN approach produces 500 intermediate rows that must be deduplicated. EXISTS finds the first completed order and moves on. The difference scales dramatically with data volume.

Be Careful with Cartesian Joins

A missing or incorrect join condition creates a Cartesian product (cross join), which can produce an astronomically large intermediate result:

-- WRONG: Missing join condition between orders and products
SELECT o.id, p.name
FROM orders o, products p
WHERE o.status = 'completed';
-- If orders has 5M rows and products has 10K rows, this produces 50 BILLION rows!

-- CORRECT: Proper join condition
SELECT o.id, p.name
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.status = 'completed';
info

Modern query optimizers sometimes detect accidental Cartesian joins and warn you or refuse to execute them. But do not rely on this. Always verify your JOIN ... ON conditions are correct.

Reduce Rows Before Joining

Filter tables before joining them to reduce the number of rows that participate in the join:

-- LESS OPTIMAL: Joins all 5M orders, then filters
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'
AND c.country = 'US';

-- BETTER (for readability with CTEs; optimizer usually handles this automatically)
WITH us_customers AS (
SELECT id, name FROM customers WHERE country = 'US'
),
recent_orders AS (
SELECT customer_id, amount FROM orders WHERE order_date >= '2024-01-01'
)
SELECT uc.name, ro.amount
FROM us_customers uc
JOIN recent_orders ro ON uc.id = ro.customer_id;

In practice, most modern optimizers push filters down automatically (a technique called predicate pushdown). However, writing pre-filtered CTEs can sometimes help the optimizer and always improves readability.

Avoid N+1 Query Patterns

The N+1 problem is one of the most devastating performance anti-patterns. It occurs when your application executes 1 query to fetch a list of N items, then executes N additional queries to fetch related data for each item. Instead of 1 or 2 queries, you end up with hundreds or thousands.

The Problem

Imagine an application that displays customers with their recent orders:

Application code (pseudocode):

customers = query("SELECT id, name FROM customers WHERE country = 'US' LIMIT 100")

for each customer in customers:
orders = query("SELECT * FROM orders WHERE customer_id = ?", customer.id)
display(customer, orders)

This executes 101 queries: 1 for customers + 100 for orders. Each query has network round-trip overhead, parsing overhead, and planning overhead. With 100 customers, this might take 500ms. With 1000 customers, it takes 5 seconds.

-- Query 1: Get customers
SELECT id, name FROM customers WHERE country = 'US' LIMIT 100;

-- Queries 2-101: One per customer (N+1 pattern)
SELECT * FROM orders WHERE customer_id = 1;
SELECT * FROM orders WHERE customer_id = 2;
SELECT * FROM orders WHERE customer_id = 3;
-- ... 97 more queries

Fix 1: Use a JOIN

Replace the N+1 with a single query that gets everything at once:

-- ONE query replaces 101 queries
SELECT c.id, c.name, o.order_date, o.amount, o.status
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'US'
ORDER BY c.id, o.order_date DESC;

One query, one network round-trip, one planning step. The database handles the association efficiently using indexes and join algorithms.

Fix 2: Use IN with a Batch Query

If a JOIN is not practical (e.g., different services own each table), at minimum batch the second query:

-- Query 1: Get customer IDs
SELECT id, name FROM customers WHERE country = 'US' LIMIT 100;
-- Application collects IDs: [1, 2, 3, ..., 100]

-- Query 2: Get ALL orders for those customers in ONE query
SELECT customer_id, order_date, amount, status
FROM orders
WHERE customer_id IN (1, 2, 3, ..., 100)
ORDER BY customer_id, order_date DESC;

Now you have 2 queries instead of 101. The application groups the orders by customer_id in memory.

Fix 3: Lateral Join (Per-Group Limit)

If you need only the latest 3 orders per customer, a LATERAL JOIN (PostgreSQL) or CROSS APPLY (SQL Server) avoids fetching all historical orders:

-- PostgreSQL: LATERAL JOIN
SELECT c.id, c.name, recent.order_date, recent.amount
FROM customers c
LEFT JOIN LATERAL (
SELECT order_date, amount
FROM orders o
WHERE o.customer_id = c.id
ORDER BY o.order_date DESC
LIMIT 3
) recent ON true
WHERE c.country = 'US';
-- SQL Server: CROSS APPLY
SELECT c.id, c.name, recent.order_date, recent.amount
FROM customers c
CROSS APPLY (
SELECT TOP 3 order_date, amount
FROM orders o
WHERE o.customer_id = c.id
ORDER BY o.order_date DESC
) recent
WHERE c.country = 'US';

This efficiently fetches exactly 3 orders per customer in a single query.

Detecting N+1 in Query Logs

Enable slow query logging or query monitoring and look for:

  • Many identical query patterns with different parameter values
  • Queries that repeat hundreds of times within a short window
  • Query count that scales linearly with result set size
-- PostgreSQL: Enable query logging temporarily
SET log_min_duration_statement = 0; -- Log all queries
-- Check pg_stat_statements for repeated patterns
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

Pagination Strategies

Displaying large result sets in pages is universal in web applications. The naive approach works fine for early pages but collapses on later pages. Choosing the right pagination strategy can mean the difference between a 5ms response and a 30-second timeout.

The Naive Approach: OFFSET/LIMIT

-- Page 1 (rows 1-20)
SELECT id, name, order_date, amount
FROM orders
ORDER BY order_date DESC
LIMIT 20 OFFSET 0;

-- Page 2 (rows 21-40)
SELECT id, name, order_date, amount
FROM orders
ORDER BY order_date DESC
LIMIT 20 OFFSET 20;

-- Page 500 (rows 9981-10000)
SELECT id, name, order_date, amount
FROM orders
ORDER BY order_date DESC
LIMIT 20 OFFSET 9980;

Why OFFSET Is Slow for Deep Pages

OFFSET does not skip rows magically. The database must read and discard all the offset rows before returning the ones you want:

EXPLAIN ANALYZE
SELECT id, order_date, amount
FROM orders
ORDER BY order_date DESC
LIMIT 20 OFFSET 100000;
Limit  (cost=52000..52005 rows=20 width=16)
(actual time=1250.300..1250.350 rows=20 loops=1)
-> Sort (cost=50000..62500 rows=5000000 width=16)
(actual time=850.200..1245.100 rows=100020 loops=1)
Sort Key: order_date DESC
Sort Method: top-N heapsort Memory: 12845kB

The database sorted and processed 100,020 rows to return just 20. On page 1, it processes 20 rows. On page 5000, it processes 100,020 rows. Performance degrades linearly with the page number.

PageOFFSETRows ProcessedApproximate Time
10202ms
101802005ms
1001980200025ms
10001998020000180ms
5000999801000001250ms

Better Approach: Keyset Pagination (Seek Method)

Keyset pagination (also called cursor-based pagination or the seek method) uses the last row's values from the current page to find the starting point of the next page. Instead of "skip N rows," it says "give me rows after this specific value."

-- Page 1: No cursor, just get the first 20
SELECT id, order_date, amount
FROM orders
ORDER BY order_date DESC, id DESC
LIMIT 20;

The application remembers the last row's values: order_date = '2024-04-08', id = 498732.

-- Page 2: Start after the last row of page 1
SELECT id, order_date, amount
FROM orders
WHERE (order_date, id) < ('2024-04-08', 498732)
ORDER BY order_date DESC, id DESC
LIMIT 20;
-- Page 3: Start after the last row of page 2
SELECT id, order_date, amount
FROM orders
WHERE (order_date, id) < ('2024-04-05', 498210)
ORDER BY order_date DESC, id DESC
LIMIT 20;

Why this is fast: With an index on (order_date DESC, id DESC), the database navigates the B-tree directly to the cursor position and reads exactly 20 rows. It never processes rows from previous pages.

PageRows ProcessedApproximate Time
1202ms
10202ms
100202ms
5000202ms

Constant time regardless of page depth.

-- Required index for keyset pagination
CREATE INDEX idx_orders_date_id ON orders(order_date DESC, id DESC);

Keyset Pagination Trade-offs

AspectOFFSET/LIMITKeyset Pagination
Performance on deep pagesDegrades linearlyConstant
Can jump to arbitrary page?Yes (OFFSET 5000)No (must traverse sequentially)
UI patternPage numbers (1, 2, 3...)"Load More" / infinite scroll
Implementation complexitySimpleModerate
Handles concurrent inserts?Poorly (rows can shift between pages)Well (cursor is stable)
When to Use Which
  • OFFSET/LIMIT: Fine when you know users will rarely go past page 10, or when the total dataset is small (under 10,000 rows). Also required when users need to jump to arbitrary page numbers.
  • Keyset pagination: Best for infinite scroll, API endpoints, feeds, or any scenario where users navigate sequentially through large datasets.

Hybrid Approach: OFFSET for Early Pages, Keyset for Deep Pages

Some applications use OFFSET for the first ~10 pages (where it is fast enough) and switch to keyset pagination for deeper access:

-- Pages 1-10: Use OFFSET (fast enough, supports page numbers)
SELECT id, order_date, amount
FROM orders
ORDER BY order_date DESC, id DESC
LIMIT 20 OFFSET 80; -- Page 5

-- Pages 10+: Switch to keyset (no performance degradation)
SELECT id, order_date, amount
FROM orders
WHERE (order_date, id) < ('2024-03-15', 497500)
ORDER BY order_date DESC, id DESC
LIMIT 20;

Additional Quick Wins

Use COUNT Wisely

-- BAD: Counting all columns (some databases scan the whole row)
SELECT COUNT(*) FROM orders;

-- GOOD: COUNT(*) is optimized in most databases, but for existence checks:
-- Instead of counting and checking > 0, use EXISTS
-- BAD
SELECT CASE WHEN COUNT(*) > 0 THEN true ELSE false END
FROM orders WHERE customer_id = 42;

-- GOOD: Stops at first match
SELECT EXISTS (SELECT 1 FROM orders WHERE customer_id = 42);

Avoid SELECT DISTINCT as a Band-Aid

If you need DISTINCT, it often means your joins are producing duplicates, which signals a query structure problem:

-- BAD: DISTINCT hides a join problem
SELECT DISTINCT c.id, c.name
FROM customers c
JOIN orders o ON c.id = o.customer_id;
-- The join produces one row per ORDER, then DISTINCT collapses them

-- GOOD: Use EXISTS if you just need customers who have orders
SELECT c.id, c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- GOOD: Use GROUP BY if you need aggregate data
SELECT c.id, c.name, COUNT(o.id) AS order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;

Batch Your INSERTs

-- BAD: One INSERT per row (one round-trip per row)
INSERT INTO orders (customer_id, amount) VALUES (1, 100);
INSERT INTO orders (customer_id, amount) VALUES (2, 200);
INSERT INTO orders (customer_id, amount) VALUES (3, 300);

-- GOOD: Batch INSERT (one round-trip for all rows)
INSERT INTO orders (customer_id, amount)
VALUES (1, 100), (2, 200), (3, 300);

Batched inserts can be 10 to 50x faster because they eliminate per-statement overhead (parsing, planning, network round-trips).

Use UNION ALL Instead of UNION When Possible

-- BAD: UNION removes duplicates (requires sorting the entire result)
SELECT name FROM customers WHERE country = 'US'
UNION
SELECT name FROM customers WHERE country = 'UK';

-- GOOD: UNION ALL skips deduplication (faster)
SELECT name FROM customers WHERE country = 'US'
UNION ALL
SELECT name FROM customers WHERE country = 'UK';

-- BEST: Single query when possible
SELECT name FROM customers WHERE country IN ('US', 'UK');

UNION must sort and deduplicate the entire combined result set. If you know there are no duplicates (or duplicates are acceptable), use UNION ALL.

Optimization Checklist

Use this checklist when reviewing a slow query:

  • Replace SELECT * with an explicit column list
  • Ensure data types match between compared columns
  • Make WHERE conditions SARGable (no functions on indexed columns)
  • Verify indexes exist on JOIN key columns
  • Verify indexes exist on frequently filtered columns
  • Check EXPLAIN ANALYZE for unexpected Sequential Scans
  • Compare estimated vs actual row counts
  • Look for high Rows Removed by Filter
  • Check for N+1 query patterns in application logs
  • Use keyset pagination for deep pagination
  • Remove unnecessary JOINs and DISTINCT
  • Batch INSERT and UPDATE operations
  • Use EXISTS instead of COUNT for existence checks
  • Run ANALYZE / UPDATE STATISTICS if estimates are inaccurate

Summary

SQL query optimization is a systematic practice, not a bag of tricks. The most impactful improvements come from these core principles:

  • Avoid SELECT *: Request only the columns you need. This reduces data transfer, enables index-only scans, and protects against schema changes.
  • Use appropriate data types: Properly sized types reduce storage, speed up comparisons, and create more efficient indexes. Never store numbers as strings or use oversized VARCHAR columns.
  • Write SARGable conditions: Keep indexed columns "clean" in WHERE clauses. Move functions, arithmetic, and type conversions to the other side of the comparison so the database can use indexes.
  • Optimize JOINs: Always index foreign key columns. Remove unnecessary joins. Use EXISTS for existence checks. Ensure the optimizer has accurate statistics to choose the right join strategy.
  • Eliminate N+1 patterns: Replace per-row queries with JOINs or batched IN queries. One query returning 1000 rows is dramatically faster than 1000 queries returning 1 row each.
  • Choose the right pagination strategy: OFFSET/LIMIT degrades linearly with page depth. Keyset pagination offers constant performance regardless of how deep the user scrolls.
The 80/20 Rule of SQL Optimization

In most applications, 80% of performance problems come from:

  1. Missing indexes on JOIN and WHERE columns
  2. N+1 query patterns
  3. SELECT * fetching unnecessary data
  4. Non-SARGable WHERE conditions

Fix these four categories first. They are the highest-impact, lowest-effort optimizations available.