Skip to main content

What Are SQL Indexes and How Do They Work?

Imagine searching for a specific topic in a 1000-page textbook. You could read every page from start to finish until you find it, or you could flip to the index at the back, look up the topic alphabetically, and jump directly to the right page. A database index works exactly the same way. Without one, the database must scan every row in a table to find what you need. With one, it jumps straight to the matching rows.

Understanding SQL indexes explained at a practical level is one of the most impactful things you can do for your database skills. Indexes are the single biggest factor in query performance, and knowing when to create them, what type to use, and when they actually hurt is what separates a developer who writes queries from a developer who writes fast queries.

This guide covers how indexes work under the hood (with a simplified look at B-tree structures), the difference between clustered and non-clustered indexes, and practical guidance on when indexes help and when they become a liability. Every concept includes examples, analogies, and clear explanations designed for immediate understanding.

What Is an Index?

An index is a separate data structure that the database maintains alongside your table. It contains a sorted copy of one or more columns from the table, paired with pointers back to the corresponding rows in the actual table. When a query needs to find rows matching a condition, the database can search this sorted structure much faster than scanning the entire table.

The Textbook Analogy

Without an IndexWith an Index
Read every page of the bookLook up the term in the back-of-book index
Time grows linearly with book sizeTime grows logarithmically (barely increases)
Simple but slowRequires extra pages (the index itself) but dramatically faster

In database terms:

Without an IndexWith an Index
Full table scan: examine every rowIndex lookup: jump to matching rows
O(n) time complexityO(log n) time complexity
Fine for small tablesEssential for large tables

A Quick Demonstration

Consider a users table with 1 million rows:

-- Without an index: scans all 1,000,000 rows
SELECT * FROM users WHERE email = 'alice@example.com';
-- Execution time: ~850ms
-- Create an index on the email column
CREATE INDEX idx_users_email ON users(email);
-- With the index: jumps directly to the matching row
SELECT * FROM users WHERE email = 'alice@example.com';
-- Execution time: ~2ms

The query went from 850ms to 2ms. That is a 425x improvement from a single CREATE INDEX statement. This is why indexes matter.

The Sample Data

We will reference this orders table throughout the guide:

idcustomer_idorder_dateamountstatusregion
11012024-01-05250completedNorth
21022024-01-12430completedSouth
31012024-01-20180pendingNorth
41032024-02-08520completedWest
51022024-02-15310cancelledSouth
61012024-03-01275completedNorth
71042024-03-18640completedEast
81032024-04-02190pendingWest
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
amount DECIMAL(12,2) NOT NULL,
status VARCHAR(20) NOT NULL,
region VARCHAR(50) NOT NULL
);
INSERT INTO orders (id, customer_id, order_date, amount, status, region) VALUES
(1, 101, '2024-01-05', 250, 'completed', 'North'),
(2, 102, '2024-01-12', 430, 'completed', 'South'),
(3, 101, '2024-01-20', 180, 'pending', 'North'),
(4, 103, '2024-02-08', 520, 'completed', 'West'),
(5, 102, '2024-02-15', 310, 'cancelled', 'South'),
(6, 101, '2024-03-01', 275, 'completed', 'North'),
(7, 104, '2024-03-18', 640, 'completed', 'East'),
(8, 103, '2024-04-02', 190, 'pending', 'West');

In reality, imagine this table with millions of rows. At small scale, indexes do not matter much. At large scale, they are the difference between a query completing in milliseconds and one that takes minutes.

B-Tree Index Concept (Simplified)

The most common type of index in every major database is the B-tree index (short for balanced tree). Understanding its structure, even at a simplified level, will help you reason about when indexes are effective and when they are not.

How a B-Tree Works

A B-tree organizes data into a tree structure with multiple levels. Each level narrows down the search space dramatically, similar to a binary search but with more branches at each node.

Imagine you have an index on the customer_id column, which contains values from 100 to 110. The B-tree might look like this:

                    [105]                    ← Root node
/ \
[102, 104] [107, 109] ← Branch nodes
/ | \ / | \
[100,101][102,103][104] [105,106][107,108][109,110] ← Leaf nodes

Pointers to actual
table rows

When you search for customer_id = 107:

  1. Root node: Is 107 >= 105? Yes. Go right.
  2. Branch node: Is 107 >= 107? Yes. Go to the middle/right child.
  3. Leaf node: Found 107! Follow the pointer to the actual row in the table.

Three steps to find the row, regardless of whether the table has 1000 or 10 million rows. This is the power of logarithmic search: doubling the data size adds only one more step.

Why "Balanced" Matters

The "B" in B-tree stands for balanced (among other interpretations). A balanced tree ensures that every leaf node is at the same depth. This guarantees consistent performance: no matter what value you search for, the number of steps is always the same. There are no "lucky" fast searches and "unlucky" slow ones.

What Leaf Nodes Store

Each leaf node in the B-tree contains:

  1. The indexed column value (e.g., customer_id = 107)
  2. A pointer (sometimes called a row locator or row ID) to the actual row in the table

The leaf nodes are also linked together in order, forming a doubly-linked list. This is what makes range queries efficient:

-- Range query: the database finds 102 in the B-tree, then follows
-- the linked leaf nodes to 103, 104, 105 without going back to the root
SELECT * FROM orders WHERE customer_id BETWEEN 102 AND 105;

How Many Levels Deep?

B-trees are remarkably shallow. A typical B-tree node can hold hundreds of keys. With a branching factor of 200:

Table RowsB-Tree LevelsLookups to Find Any Row
20011
40,00022
8,000,00033
1,600,000,00044

Even with 1.6 billion rows, the index only needs 4 lookups. This is why B-tree indexes are so effective.

Other Index Types (Brief Overview)

While B-tree is the default and most common, databases support other index types for specific use cases:

Index TypeBest ForSupported By
B-treeEquality, range queries, sortingAll databases (default)
HashExact equality lookups onlyPostgreSQL, MySQL (Memory engine)
GINFull-text search, arrays, JSONBPostgreSQL
GiSTGeometric data, full-text searchPostgreSQL
BRINVery large tables with naturally ordered dataPostgreSQL
BitmapLow-cardinality columns in data warehousesOracle
Full-textText searchMySQL, SQL Server

For the vast majority of everyday queries, B-tree indexes are what you need and what this guide focuses on.

Clustered vs Non-Clustered Indexes

This is one of the most important distinctions in index architecture. Understanding the difference helps you make better decisions about which indexes to create and how many.

Clustered Index

A clustered index determines the physical order of data in the table. The table's rows are literally stored on disk sorted by the clustered index key. Because of this, a table can have only one clustered index, just like a physical book can only be sorted in one order.

Think of a dictionary: the words (the data) are physically arranged in alphabetical order. The alphabetical order is the clustered index. You do not need a separate lookup structure because the data itself is organized for fast searching.

Clustered Index on "id":

Page 1: [id=1, data...] [id=2, data...] [id=3, data...]
Page 2: [id=4, data...] [id=5, data...] [id=6, data...]
Page 3: [id=7, data...] [id=8, data...] [id=9, data...]

↑ The actual table data is stored in this order

Behavior by Database

DatabaseClustered Index Behavior
SQL ServerPrimary key creates a clustered index by default
MySQL (InnoDB)Primary key is the clustered index (always). Called the "cluster key"
PostgreSQLNo automatic clustered index. Tables are heap-organized by default. You can use CLUSTER command, but it is a one-time physical reorder, not automatically maintained
OracleTables are heap-organized by default. Use Index-Organized Tables (IOT) for clustered behavior

Creating a Clustered Index (SQL Server)

-- SQL Server: Explicitly create a clustered index
CREATE CLUSTERED INDEX idx_orders_date
ON orders(order_date);

Now the rows in the orders table are physically stored sorted by order_date. Range queries on order_date become extremely fast because the database reads contiguous pages from disk.

Why Only One?

A table's rows can only be physically sorted in one order. You cannot sort the same rows by order_date and simultaneously by customer_id. That is why each table is limited to one clustered index.

Non-Clustered Index

A non-clustered index is a separate structure that points back to the table's rows. The table data stays in its original order (or the clustered index order). The non-clustered index is like the index at the back of a textbook: it is a sorted list of terms with page numbers, completely separate from the main content.

Non-Clustered Index on "customer_id":

Index structure (sorted by customer_id):
[101 → Row 1] [101 → Row 3] [101 → Row 6]
[102 → Row 2] [102 → Row 5]
[103 → Row 4] [103 → Row 8]
[104 → Row 7]

Actual table (sorted by id / clustered key):
Row 1: [id=1, customer_id=101, ...]
Row 2: [id=2, customer_id=102, ...]
Row 3: [id=3, customer_id=101, ...]
...

A table can have many non-clustered indexes (typically up to 999 in SQL Server, no hard limit in PostgreSQL and MySQL, though practical limits apply).

Creating a Non-Clustered Index

-- This is non-clustered by default in all databases
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_region ON orders(region);

Side-by-Side Comparison

AspectClustered IndexNon-Clustered Index
Physical data orderDetermines the row order on diskDoes not affect row order
Number per tableOne onlyMany allowed
Lookup speedFastest for range queries on the indexed columnFast, but requires an extra step to reach the actual row
Contains row data?Yes (the index is the table)No (contains pointers to the table)
Extra storageNone (replaces default heap)Yes (separate structure)
Best forThe most commonly queried range column (e.g., date)Equality lookups, joins, filters

The "Bookmark Lookup" (Key Lookup)

When a non-clustered index is used, the database finds the matching entries in the index, then must follow the pointer to the actual table row to retrieve any columns not included in the index. This extra step is called a bookmark lookup (SQL Server) or heap fetch (PostgreSQL).

-- Index on customer_id, but we need amount and status
SELECT amount, status FROM orders WHERE customer_id = 101;

Steps:

  1. Search the non-clustered index on customer_id to find rows with value 101.
  2. For each match, follow the pointer to the actual table row.
  3. Read amount and status from the table row.

Step 2 is the bookmark lookup. If the query returns many rows, these lookups can be expensive because they involve random I/O to different parts of the table.

Covering Indexes Eliminate Bookmark Lookups

A covering index includes all the columns the query needs, so the database never has to visit the table at all:

-- Covering index: includes customer_id, amount, and status
CREATE INDEX idx_orders_covering
ON orders(customer_id) INCLUDE (amount, status);

Now the query can be answered entirely from the index without touching the table. This is called an index-only scan and is the fastest possible query execution.

The INCLUDE syntax is supported in PostgreSQL 11+ and SQL Server. In MySQL, you achieve a similar effect by adding all needed columns to the index key itself.

When Indexes Help

Indexes dramatically improve performance in specific, well-defined scenarios. Here are the situations where they provide the most benefit.

1. WHERE Clause Filtering

The most common and impactful use case. Indexes accelerate lookups on filtered columns:

-- Fast with an index on customer_id
SELECT * FROM orders WHERE customer_id = 103;

-- Fast with an index on status
SELECT * FROM orders WHERE status = 'pending';

-- Fast with an index on order_date
SELECT * FROM orders WHERE order_date >= '2024-03-01';

Without an index, each of these requires scanning every row in the table.

2. JOIN Operations

Indexes on join columns allow the database to match rows between tables efficiently:

-- Index on orders.customer_id speeds up this join significantly
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON c.id = o.customer_id;

Without an index on orders.customer_id, the database must scan the entire orders table for every customer.

3. ORDER BY and Sorting

If an index matches the ORDER BY columns, the database can read rows in sorted order directly from the index instead of sorting them in memory:

-- Index on order_date eliminates the need for a separate sort step
SELECT * FROM orders ORDER BY order_date DESC;

Without the index, the database must read all rows and then sort them, which can be expensive for large tables.

4. GROUP BY Aggregations

Similar to sorting, an index on the grouped column(s) can speed up aggregations:

-- Index on region helps the database group rows efficiently
SELECT region, COUNT(*), SUM(amount)
FROM orders
GROUP BY region;

5. Uniqueness Enforcement

Unique indexes enforce data integrity and provide fast lookups:

-- Unique index on email ensures no duplicates and fast lookups
CREATE UNIQUE INDEX idx_users_email ON users(email);

6. MIN/MAX Queries

Finding the minimum or maximum value is nearly instantaneous with a B-tree index because the leftmost (MIN) or rightmost (MAX) leaf node contains the answer:

-- With an index on order_date, this is a single B-tree lookup
SELECT MIN(order_date) FROM orders;
SELECT MAX(amount) FROM orders;

Without an index, the database must scan the entire table to find the extreme values.

When Indexes Hurt

Indexes are not free. They come with costs, and in certain scenarios, they actually degrade performance. Understanding when not to index is just as important as knowing when to index.

1. Write Performance Overhead

Every INSERT, UPDATE, and DELETE must also update all indexes on the table. More indexes mean slower writes.

-- If this table has 8 indexes, this single INSERT triggers
-- 8 index updates in addition to the table insert
INSERT INTO orders (id, customer_id, order_date, amount, status, region)
VALUES (9, 105, '2024-04-15', 380, 'completed', 'North');
Indexes on TableRelative Insert Speed
0 (heap only)Fastest
1-3Slightly slower
5-8Noticeably slower
10+Significantly slower
caution

For write-heavy workloads (logging tables, event streams, IoT data), excessive indexes can become a major bottleneck. Every insert must update every index, and this overhead compounds at scale.

2. Storage Space

Each index is a separate data structure stored on disk. For large tables, indexes can consume as much or more space than the table itself.

-- PostgreSQL: Check table and index sizes
SELECT
pg_size_pretty(pg_relation_size('orders')) AS table_size,
pg_size_pretty(pg_indexes_size('orders')) AS total_index_size;

Example output for a large table:

table_sizetotal_index_size
2 GB3.5 GB

The indexes are using more disk space than the actual data. This is not unusual for heavily indexed tables.

3. Small Tables

For tables with fewer than a few thousand rows, a full table scan is often faster than an index lookup. The entire table fits in a single disk read, and the overhead of traversing the B-tree and following pointers is wasted effort.

-- On a table with 50 rows, an index scan is actually slower
-- than just reading all 50 rows directly
SELECT * FROM small_lookup_table WHERE code = 'US';

The database optimizer usually recognizes this and chooses a full scan anyway, but creating indexes on tiny tables is pointless.

4. Low-Selectivity Columns

Selectivity refers to how many distinct values a column has relative to the total number of rows. A column with low selectivity has very few distinct values, meaning each value matches a large percentage of rows.

-- Bad candidate for an index: only 3 distinct values
-- An index lookup would return ~33% of the table per value
SELECT * FROM orders WHERE status = 'completed';

If status has only three possible values (completed, pending, cancelled) and the table has 1 million rows, searching for completed might return 600,000 rows. At that point, the database is better off scanning the entire table than doing 600,000 individual index lookups followed by 600,000 bookmark lookups.

Good index candidates have high selectivity: columns like email, order_id, phone_number, or timestamp where each value appears in very few rows.

ColumnDistinct ValuesSelectivityGood Index?
email~1,000,000Very highYes
order_date~365 per yearHighYes
customer_id~50,000HighYes
region4LowUsually no
status3Very lowUsually no
is_active2Extremely lowNo
Partial Exception: Partial Indexes

In PostgreSQL, you can create partial indexes that only index rows matching a condition. This makes low-selectivity columns indexable when you only query a small subset:

-- Only index the rare 'pending' status (5% of rows)
CREATE INDEX idx_orders_pending
ON orders(order_date)
WHERE status = 'pending';

This index is small, fast, and only used for queries that include WHERE status = 'pending'.

5. Columns That Are Frequently Updated

Updating an indexed column is essentially a delete + insert in the index. If a column is updated very frequently, the index maintenance overhead can outweigh the read benefits.

-- If status changes on every order multiple times (pending → processing → shipped → delivered),
-- an index on status is constantly being rebuilt
UPDATE orders SET status = 'shipped' WHERE id = 42;

6. Queries That the Index Cannot Help

An index is only useful if the query uses the indexed column in a way the index supports. Several common patterns prevent index usage:

Functions on Indexed Columns

-- WRONG: Index on order_date is NOT used because of the function
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

-- CORRECT: Rewrite to keep the indexed column "clean"
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

When you wrap an indexed column in a function, the database cannot use the B-tree because the sorted values in the index are the raw order_date values, not the YEAR() of those values.

Leading Wildcard in LIKE

-- WRONG: Index on name is NOT used (leading wildcard)
SELECT * FROM users WHERE name LIKE '%smith';

-- CORRECT: Index IS used (trailing wildcard only)
SELECT * FROM users WHERE name LIKE 'smith%';

A B-tree index is sorted left to right. Searching for values that start with a known prefix works (like looking up "Sm..." in a dictionary). Searching for values that end with a known suffix requires checking every entry.

Implicit Type Conversion

-- WRONG: phone_number is VARCHAR, but you're comparing to an integer
-- The database converts every value, preventing index use
SELECT * FROM users WHERE phone_number = 5551234;

-- CORRECT: Compare with the correct type
SELECT * FROM users WHERE phone_number = '5551234';

OR Conditions on Different Columns

-- Index on customer_id alone may not help here
SELECT * FROM orders WHERE customer_id = 101 OR region = 'North';

The optimizer may struggle to combine two separate indexes efficiently. In some databases, this results in a full table scan. Consider rewriting as UNION:

SELECT * FROM orders WHERE customer_id = 101
UNION
SELECT * FROM orders WHERE region = 'North';

Creating and Managing Indexes

Basic Syntax

-- Single-column index
CREATE INDEX index_name ON table_name(column_name);

-- Multi-column (composite) index
CREATE INDEX index_name ON table_name(column1, column2);

-- Unique index
CREATE UNIQUE INDEX index_name ON table_name(column_name);

-- Descending order index
CREATE INDEX index_name ON table_name(column_name DESC);

Composite Indexes and Column Order

A composite index (also called a multi-column index) covers more than one column. The order of columns matters enormously because of the leftmost prefix rule: the index can only be used if the query filters on a leading subset of the indexed columns.

CREATE INDEX idx_orders_composite ON orders(customer_id, order_date, status);

This single index can satisfy queries that filter on:

Query FilterUses This Index?
WHERE customer_id = 101Yes (leftmost column)
WHERE customer_id = 101 AND order_date = '2024-01-05'Yes (first two columns)
WHERE customer_id = 101 AND order_date = '2024-01-05' AND status = 'completed'Yes (all three columns)
WHERE order_date = '2024-01-05'No (skipped customer_id)
WHERE status = 'completed'No (skipped customer_id and order_date)
WHERE customer_id = 101 AND status = 'completed'Partial (uses customer_id, then scans for status)
Column Order Rule of Thumb

In a composite index, put columns in this order:

  1. Equality conditions first (column = value)
  2. Range conditions last (column > value, BETWEEN, etc.)
-- Good: equality column first, range column second
CREATE INDEX idx_orders_cust_date ON orders(customer_id, order_date);

-- Supports: WHERE customer_id = 101 AND order_date >= '2024-01-01'
-- The database uses customer_id for exact lookup, then order_date for range scan

Viewing Existing Indexes

-- PostgreSQL
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';

-- MySQL
SHOW INDEX FROM orders;

-- SQL Server
EXEC sp_helpindex 'orders';

-- SQLite
SELECT name, sql FROM sqlite_master
WHERE type = 'index' AND tbl_name = 'orders';

Dropping Indexes

DROP INDEX index_name;

-- PostgreSQL: Need to specify IF EXISTS for safety
DROP INDEX IF EXISTS idx_orders_customer;

-- MySQL: Must specify the table
DROP INDEX idx_orders_customer ON orders;

-- SQL Server
DROP INDEX idx_orders_customer ON orders;

How to Know If an Index Is Being Used

Creating an index does not guarantee the database will use it. The query optimizer decides whether using an index is faster than a full table scan. You can check the optimizer's decision using EXPLAIN.

PostgreSQL

EXPLAIN SELECT * FROM orders WHERE customer_id = 101;

Output with index:

Index Scan using idx_orders_customer on orders  (cost=0.28..8.30 rows=3 width=44)
Index Cond: (customer_id = 101)

Output without index:

Seq Scan on orders  (cost=0.00..18.50 rows=3 width=44)
Filter: (customer_id = 101)

Index Scan means the index is being used. Seq Scan (sequential scan) means a full table scan.

MySQL

EXPLAIN SELECT * FROM orders WHERE customer_id = 101;

Look at the type and key columns. type=ref with a key name means an index is used. type=ALL means a full table scan.

SQL Server

SET SHOWPLAN_TEXT ON;
GO
SELECT * FROM orders WHERE customer_id = 101;
GO
SET SHOWPLAN_TEXT OFF;

Look for Index Seek (good) vs. Table Scan (bad).

tip

Run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL for actual timing) on your slow queries before creating indexes. This shows you exactly what the database is doing and helps you identify which column needs an index.

Index Maintenance

Indexes are not "set and forget." Over time, as data is inserted, updated, and deleted, indexes can become fragmented or bloated, reducing their effectiveness.

PostgreSQL: REINDEX

-- Rebuild a specific index
REINDEX INDEX idx_orders_customer;

-- Rebuild all indexes on a table
REINDEX TABLE orders;

SQL Server: Rebuild or Reorganize

-- Rebuild (full reconstruction, more thorough)
ALTER INDEX idx_orders_customer ON orders REBUILD;

-- Reorganize (lighter maintenance, less locking)
ALTER INDEX idx_orders_customer ON orders REORGANIZE;

MySQL: OPTIMIZE TABLE

-- Rebuilds the table and all its indexes
OPTIMIZE TABLE orders;

When to Maintain Indexes

  • After bulk loading large amounts of data
  • After deleting a large percentage of rows
  • When EXPLAIN shows the optimizer choosing full scans despite existing indexes
  • As part of a regular weekly or monthly maintenance schedule

Common Mistakes to Avoid

Mistake 1: Indexing Every Column

-- WRONG: Creating indexes on every single column
CREATE INDEX idx1 ON orders(id);
CREATE INDEX idx2 ON orders(customer_id);
CREATE INDEX idx3 ON orders(order_date);
CREATE INDEX idx4 ON orders(amount);
CREATE INDEX idx5 ON orders(status);
CREATE INDEX idx6 ON orders(region);

Each index consumes disk space and slows down every write operation. Instead, analyze your actual query patterns and index only the columns that appear in WHERE, JOIN, and ORDER BY clauses of your most common or slowest queries.

Better approach: Start with no indexes (beyond the primary key), identify slow queries, and add indexes surgically.

Mistake 2: Duplicate and Redundant Indexes

-- These are redundant: idx_a covers what idx_b would do
CREATE INDEX idx_a ON orders(customer_id, order_date);
CREATE INDEX idx_b ON orders(customer_id); -- Redundant!

The composite index (customer_id, order_date) already satisfies queries that filter on customer_id alone (due to the leftmost prefix rule). The single-column index on customer_id is wasted space.

Mistake 3: Wrapping Indexed Columns in Functions

-- WRONG: Prevents index usage
SELECT * FROM orders WHERE LOWER(status) = 'completed';
SELECT * FROM orders WHERE order_date + INTERVAL '30 days' > NOW();
SELECT * FROM users WHERE CONCAT(first_name, ' ', last_name) = 'Alice Smith';

Fix: Rewrite to keep the indexed column on one side without any function:

SELECT * FROM orders WHERE status = 'completed';
SELECT * FROM orders WHERE order_date > NOW() - INTERVAL '30 days';

If you must use a function, create an expression index (PostgreSQL) or computed column index (SQL Server):

-- PostgreSQL: Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now this uses the index
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

Mistake 4: Ignoring Composite Index Column Order

-- Your most common query filters by status first
SELECT * FROM orders WHERE status = 'completed' AND customer_id = 101;

-- But your index leads with customer_id
CREATE INDEX idx_wrong_order ON orders(customer_id, status);

This index works, but an index with (status, customer_id) would better match the query if status is the more selective filter. In practice, analyze your query patterns and design composite indexes to match.

Summary

SQL indexes explained in practical terms: they are sorted data structures that let the database find rows without scanning the entire table.

  • B-tree indexes are the default and most common type. They organize data into a balanced tree structure that enables searches in O(log n) time, meaning even tables with billions of rows require only a handful of lookups.
  • Clustered indexes determine the physical order of data on disk. A table can have only one. In MySQL InnoDB, the primary key is always the clustered index. In SQL Server, you can choose which column to cluster on.
  • Non-clustered indexes are separate structures with pointers back to the table rows. A table can have many. They require an extra "bookmark lookup" step unless the index covers all needed columns.
  • Indexes help with WHERE filters, JOIN conditions, ORDER BY sorting, GROUP BY grouping, uniqueness enforcement, and MIN/MAX lookups.
  • Indexes hurt when there are too many (slowing writes), on small tables (unnecessary overhead), on low-selectivity columns (too many matches), and when queries wrap indexed columns in functions (preventing index use).
  • Composite indexes follow the leftmost prefix rule: the query must filter on a leading subset of the indexed columns for the index to be used.
  • Always use EXPLAIN to verify that your indexes are actually being used by the query optimizer.
Index Strategy Checklist
  1. Start with a primary key index (automatic in most databases)
  2. Add indexes on foreign key columns used in joins
  3. Identify your slowest and most frequent queries
  4. Run EXPLAIN to see if they use full table scans
  5. Add indexes on the columns in their WHERE and JOIN clauses
  6. Verify with EXPLAIN that the new index is used
  7. Monitor write performance to ensure indexes are not causing bottlenecks
  8. Periodically review and drop unused indexes