SQL NTILE() and Percentiles

When analyzing data, you frequently need to divide rows into equal groups, classify values into quartiles, or determine where a specific value falls relative to all others. These are the kinds of questions that power salary band analysis, student grading curves, customer segmentation, and performance benchmarking. In SQL, the tools for this are NTILE(), PERCENT_RANK(), and CUME_DIST().

This guide walks you through each function in detail. You will learn how SQL NTILE splits rows into evenly sized buckets, how PERCENT_RANK() tells you a row's relative standing as a percentage, and how CUME_DIST() calculates cumulative distribution. Every concept is paired with practical examples, clear outputs, and real-world use cases so you can apply them immediately.

Prerequisites

You should be comfortable with SQL window function basics, including the OVER() clause, PARTITION BY, and ORDER BY inside OVER(). All three functions covered here are dedicated window functions. They can only be used with the OVER() clause and have no standalone aggregate equivalent.

The Sample Data

We will use the following employees table throughout this guide:

id	name	department	salary
1	Alice	Engineering	95000
2	Bob	Engineering	88000
3	Charlie	Engineering	82000
4	Diana	Engineering	78000
5	Eve	Sales	72000
6	Frank	Sales	68000
7	Grace	Sales	65000
8	Hank	Sales	60000
9	Ivy	Marketing	74000
10	Jack	Marketing	70000
11	Karen	Marketing	66000
12	Leo	Marketing	62000

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    department VARCHAR(50) NOT NULL,
    salary DECIMAL(10,2) NOT NULL
);
INSERT INTO employees (id, name, department, salary) VALUES
(1,  'Alice',   'Engineering', 95000),
(2,  'Bob',     'Engineering', 88000),
(3,  'Charlie', 'Engineering', 82000),
(4,  'Diana',   'Engineering', 78000),
(5,  'Eve',     'Sales',       72000),
(6,  'Frank',   'Sales',       68000),
(7,  'Grace',   'Sales',       65000),
(8,  'Hank',    'Sales',       60000),
(9,  'Ivy',     'Marketing',   74000),
(10, 'Jack',    'Marketing',   70000),
(11, 'Karen',   'Marketing',   66000),
(12, 'Leo',     'Marketing',   62000);

Twelve employees spread across three departments, each with four employees. The salaries are deliberately unique to make the examples easier to follow.

NTILE(): Dividing Results into Buckets

NTILE() distributes rows in an ordered partition into a specified number of roughly equal groups (called buckets or tiles). Each row is assigned a bucket number from 1 to N, where N is the number you pass to the function.

Syntax

NTILE(number_of_buckets) OVER (
    [PARTITION BY column1, ...]
    ORDER BY column2 [ASC|DESC]
)

Parameter	Required	Description
`number_of_buckets`	Yes	The number of groups to divide the rows into
`PARTITION BY`	No	Divides rows into independent groups before bucketing
`ORDER BY`	Yes	Determines the order in which rows are assigned to buckets

info

ORDER BY inside OVER() is required for NTILE() to produce meaningful results. Without it, the assignment of rows to buckets is arbitrary and non-deterministic.

Basic Example: Split All Employees into 4 Groups

SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

Output:

name	salary	quartile
Alice	95000.00	1
Bob	88000.00	1
Charlie	82000.00	1
Diana	78000.00	2
Ivy	74000.00	2
Eve	72000.00	2
Jack	70000.00	3
Frank	68000.00	3
Karen	66000.00	3
Grace	65000.00	4
Leo	62000.00	4
Hank	60000.00	4

We have 12 rows and requested 4 buckets, so each bucket gets exactly 3 rows (12 / 4 = 3). The employees are sorted by salary descending, so bucket 1 contains the top earners and bucket 4 contains the lowest earners.

How NTILE Handles Uneven Distribution

What happens when the number of rows does not divide evenly into buckets? SQL NTILE distributes the remainder rows across the first buckets, giving them one extra row each.

Let's use only the first 10 employees and split them into 3 buckets:

SELECT
    name,
    salary,
    NTILE(3) OVER (ORDER BY salary DESC) AS bucket
FROM employees
WHERE id <= 10;

Output:

name	salary	bucket
Alice	95000.00	1
Bob	88000.00	1
Charlie	82000.00	1
Diana	78000.00	1
Ivy	74000.00	2
Eve	72000.00	2
Jack	70000.00	2
Frank	68000.00	3
Grace	65000.00	3
Hank	60000.00	3

With 10 rows and 3 buckets: 10 / 3 = 3 with a remainder of 1. The first bucket gets an extra row, resulting in bucket sizes of 4, 3, 3.

If the remainder were 2, the first two buckets would each get an extra row. The rule is simple: larger buckets always come first.

Total Rows	Buckets	Distribution
12	4	3, 3, 3, 3
10	3	4, 3, 3
11	4	3, 3, 3, 2
7	3	3, 2, 2
5	4	2, 1, 1, 1

caution

NTILE() distributes rows as evenly as possible but it does not consider the actual values. If you have 100 employees and 95 of them earn between 50000 and 55000, while 5 earn over 200000, NTILE(4) will still put 25 employees in each bucket. The top bucket will mix vastly different salary ranges. For value-based bucketing, use CASE expressions with explicit thresholds or WIDTH_BUCKET() (where supported).

NTILE with PARTITION BY

When you add PARTITION BY, SQL NTILE restarts the bucketing independently within each partition:

SELECT
    name,
    department,
    salary,
    NTILE(2) OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS salary_half
FROM employees;

Output:

name	department	salary	salary_half
Alice	Engineering	95000.00	1
Bob	Engineering	88000.00	1
Charlie	Engineering	82000.00	2
Diana	Engineering	78000.00	2
Ivy	Marketing	74000.00	1
Jack	Marketing	70000.00	1
Karen	Marketing	66000.00	2
Leo	Marketing	62000.00	2
Eve	Sales	72000.00	1
Frank	Sales	68000.00	1
Grace	Sales	65000.00	2
Hank	Sales	60000.00	2

Each department is independently split into two halves: the upper half (salary_half = 1) and the lower half (salary_half = 2). This is useful for comparing the top performers against the bottom performers within each team.

Quartile Analysis

A quartile divides data into four equal groups. This is one of the most common applications of NTILE() in real-world analytics, used extensively in salary analysis, student performance, and customer segmentation.

Assigning Quartiles

SELECT
    name,
    department,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;

Output:

name	department	salary	salary_quartile
Hank	Sales	60000.00	1
Leo	Marketing	62000.00	1
Grace	Sales	65000.00	1
Karen	Marketing	66000.00	2
Frank	Sales	68000.00	2
Jack	Marketing	70000.00	2
Eve	Sales	72000.00	3
Ivy	Marketing	74000.00	3
Diana	Engineering	78000.00	3
Charlie	Engineering	82000.00	4
Bob	Engineering	88000.00	4
Alice	Engineering	95000.00	4

Notice the ORDER BY salary is ascending here (the default). Quartile 1 contains the lowest salaries, and quartile 4 contains the highest. This is the standard statistical convention where Q1 is the bottom 25%.

Quartile Summary Statistics

You can combine NTILE() with aggregate functions to create a quartile summary report:

WITH quartiled AS (
    SELECT
        name,
        salary,
        NTILE(4) OVER (ORDER BY salary) AS quartile
    FROM employees
)
SELECT
    quartile,
    COUNT(*) AS employee_count,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    ROUND(AVG(salary), 2) AS avg_salary
FROM quartiled
GROUP BY quartile
ORDER BY quartile;

Output:

quartile	employee_count	min_salary	max_salary	avg_salary
1	3	60000.00	65000.00	62333.33
2	3	66000.00	70000.00	68000.00
3	3	72000.00	78000.00	74666.67
4	3	82000.00	95000.00	88333.33

This gives a clean picture of salary distribution across the company. The gap between Q3 and Q4 (74666 to 88333) is much larger than the gap between Q1 and Q2 (62333 to 68000), indicating salary skew toward the top.

Labeling Quartiles for Reports

Instead of showing raw numbers, you can map quartile numbers to meaningful labels:

SELECT
    name,
    department,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS quartile,
    CASE NTILE(4) OVER (ORDER BY salary)
        WHEN 1 THEN 'Bottom 25%'
        WHEN 2 THEN 'Lower-Middle 25%'
        WHEN 3 THEN 'Upper-Middle 25%'
        WHEN 4 THEN 'Top 25%'
    END AS salary_band
FROM employees;

Output:

name	department	salary	quartile	salary_band
Hank	Sales	60000.00	1	Bottom 25%
Leo	Marketing	62000.00	1	Bottom 25%
Grace	Sales	65000.00	1	Bottom 25%
Karen	Marketing	66000.00	2	Lower-Middle 25%
Frank	Sales	68000.00	2	Lower-Middle 25%
Jack	Marketing	70000.00	2	Lower-Middle 25%
Eve	Sales	72000.00	3	Upper-Middle 25%
Ivy	Marketing	74000.00	3	Upper-Middle 25%
Diana	Engineering	78000.00	3	Upper-Middle 25%
Charlie	Engineering	82000.00	4	Top 25%
Bob	Engineering	88000.00	4	Top 25%
Alice	Engineering	95000.00	4	Top 25%

Other Common Bucket Sizes

NTILE() is not limited to quartiles. You can divide into any number of buckets:

Bucket Count	Statistical Term	Use Case
`NTILE(2)`	Median split	Above/below median
`NTILE(3)`	Terciles	Low/medium/high segments
`NTILE(4)`	Quartiles	Standard quartile analysis
`NTILE(5)`	Quintiles	Customer value tiers
`NTILE(10)`	Deciles	Detailed distribution analysis
`NTILE(100)`	Percentiles	Fine-grained percentile ranking

-- Decile analysis
SELECT
    name,
    salary,
    NTILE(10) OVER (ORDER BY salary) AS decile
FROM employees;

Output:

name	salary	decile
Hank	60000.00	1
Leo	62000.00	1
Grace	65000.00	2
Karen	66000.00	2
Frank	68000.00	3
Jack	70000.00	4
Eve	72000.00	5
Ivy	74000.00	6
Diana	78000.00	7
Charlie	82000.00	8
Bob	88000.00	9
Alice	95000.00	10

With 12 rows and 10 buckets, the first 2 buckets get 2 rows each (12 / 10 = 1 remainder 2), and the remaining 8 buckets get 1 row each. Deciles 10 has 1 row only in a perfect distribution, but here the remainder logic pushes some extra rows into the early buckets, and the last few end up sharing. As you can see, decile 10 ended up with more rows. This is the inherent limitation of NTILE() with small datasets.

tip

NTILE() works best with large datasets where the number of rows significantly exceeds the number of buckets. With 12 rows and 10 buckets, the distribution is awkward. With 10000 rows and 10 buckets, each bucket gets exactly 1000 rows, which is clean and statistically meaningful.

PERCENT_RANK(): Relative Standing as a Percentage

PERCENT_RANK() calculates a row's relative rank as a value between 0 and 1 (or 0% to 100%). It answers the question: "What percentage of rows have a value less than this row's value?"

Syntax

PERCENT_RANK() OVER (
    [PARTITION BY column1, ...]
    ORDER BY column2 [ASC|DESC]
)

PERCENT_RANK() takes no arguments. It relies entirely on the ORDER BY inside OVER().

Formula

PERCENT_RANK = (rank - 1) / (total_rows - 1)

Where rank is the row's RANK() value and total_rows is the number of rows in the partition.

The first row (rank 1) always gets a PERCENT_RANK of 0.
The last row always gets a PERCENT_RANK of 1.

Basic Example

SELECT
    name,
    salary,
    RANK() OVER (ORDER BY salary) AS rnk,
    ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM employees;

Output:

name	salary	rnk	pct_rank
Hank	60000.00	1	0
Leo	62000.00	2	0.0909
Grace	65000.00	3	0.1818
Karen	66000.00	4	0.2727
Frank	68000.00	5	0.3636
Jack	70000.00	6	0.4545
Eve	72000.00	7	0.5455
Ivy	74000.00	8	0.6364
Diana	78000.00	9	0.7273
Charlie	82000.00	10	0.8182
Bob	88000.00	11	0.9091
Alice	95000.00	12	1

Let's verify the formula for Leo (rank 2):

PERCENT_RANK = (2 - 1) / (12 - 1) = 1 / 11 = 0.0909

Alice (rank 12) gets exactly 1.0, and Hank (rank 1) gets exactly 0.0.

Interpreting PERCENT_RANK

A PERCENT_RANK of 0.7273 for Diana means that approximately 72.73% of employees earn less than her. Conversely, she earns more than roughly 72.73% of the workforce.

PERCENT_RANK with PARTITION BY

SELECT
    name,
    department,
    salary,
    ROUND(PERCENT_RANK() OVER (
        PARTITION BY department
        ORDER BY salary
    ), 4) AS dept_pct_rank
FROM employees;

Output:

name	department	salary	dept_pct_rank
Diana	Engineering	78000.00	0
Charlie	Engineering	82000.00	0.3333
Bob	Engineering	88000.00	0.6667
Alice	Engineering	95000.00	1
Leo	Marketing	62000.00	0
Karen	Marketing	66000.00	0.3333
Jack	Marketing	70000.00	0.6667
Ivy	Marketing	74000.00	1
Hank	Sales	60000.00	0
Grace	Sales	65000.00	0.3333
Frank	Sales	68000.00	0.6667
Eve	Sales	72000.00	1

Each department is evaluated independently. Within Engineering, Diana is at the bottom (0.0) and Alice is at the top (1.0). The formula uses each partition's own row count (4 in each department): (rank - 1) / (4 - 1).

How PERCENT_RANK Handles Ties

When rows share the same value, they receive the same RANK(), which means they also receive the same PERCENT_RANK(). Let's add a tie to see:

-- Simulating a tie with a modified dataset
SELECT
    name,
    salary,
    RANK() OVER (ORDER BY salary) AS rnk,
    ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM (
    VALUES ('Hank', 60000), ('Leo', 62000), ('Grace', 65000),
           ('Karen', 65000), ('Frank', 68000)
) AS t(name, salary);

Output:

name	salary	rnk	pct_rank
Hank	60000	1	0.0000
Leo	62000	2	0.2500
Karen	65000	3	0.5000
Grace	65000	3	0.5000
Frank	68000	5	1.0000

Grace and Karen are tied at rank 3, so they both get the same PERCENT_RANK of 0.5. Rank 4 is skipped (standard RANK() behavior), and Frank jumps to rank 5 with a PERCENT_RANK of 1.0.

CUME_DIST(): Cumulative Distribution

CUME_DIST() (cumulative distribution) calculates the proportion of rows that have a value less than or equal to the current row's value. It also returns a value between 0 and 1, but its formula and behavior differ from PERCENT_RANK().

Syntax

CUME_DIST() OVER (
    [PARTITION BY column1, ...]
    ORDER BY column2 [ASC|DESC]
)

Like PERCENT_RANK(), CUME_DIST() takes no arguments.

Formula

CUME_DIST = (number of rows with value <= current row's value) / (total rows)

Key differences from PERCENT_RANK():

CUME_DIST() is always greater than 0 (never exactly 0)
The last row (or rows tied for last) always gets 1.0
CUME_DIST() answers: "What fraction of all rows are at or below this value?"

Basic Example

SELECT
    name,
    salary,
    ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank,
    ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM employees;

Output:

name	salary	pct_rank	cume_dist
Hank	60000	0.0000	0.0833
Leo	62000	0.0909	0.1667
Grace	65000	0.1818	0.2500
Karen	66000	0.2727	0.3333
Frank	68000	0.3636	0.4167
Jack	70000	0.4545	0.5000
Eve	72000	0.5455	0.5833
Ivy	74000	0.6364	0.6667
Diana	78000	0.7273	0.7500
Charlie	82000	0.8182	0.8333
Bob	88000	0.9091	0.9167
Alice	95000	1.0000	1.0000

Let's verify for Hank: 1 row has a salary <= 60000 (just Hank himself), and there are 12 total rows. CUME_DIST = 1 / 12 = 0.0833.

For Grace: 3 rows have salaries <= 65000 (Hank, Leo, Grace). CUME_DIST = 3 / 12 = 0.2500.

PERCENT_RANK vs CUME_DIST: When to Use Which

Aspect	PERCENT_RANK	CUME_DIST
Formula	`(rank - 1) / (total - 1)`	`rows <= current / total`
Range	0 to 1	Greater than 0, up to 1
First row value	Always 0	Always > 0
Last row value	Always 1	Always 1
Interpretation	% of rows below this one	% of rows at or below this one
Analogy	"I scored better than X% of people"	"X% of people scored at or below me"

When to use which?

Use PERCENT_RANK() when you want to know a row's relative position within the group (like a percentile rank on a standardized test: "You scored in the 85th percentile").
Use CUME_DIST() when you want to know what fraction of the data falls at or below a certain value (like "75% of employees earn this amount or less").

CUME_DIST with Ties

When rows are tied, CUME_DIST() includes all tied rows in the numerator:

SELECT
    name,
    salary,
    ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM (
    VALUES ('Hank', 60000), ('Leo', 62000), ('Grace', 65000),
           ('Karen', 65000), ('Frank', 68000)
) AS t(name, salary);

Output:

name	salary	cume_dist
Hank	60000	0.2000
Leo	62000	0.4000
Grace	65000	0.8000
Karen	65000	0.8000
Frank	68000	1.0000

Grace and Karen both show 0.8000 because 4 out of 5 rows have salaries <= 65000. All tied rows receive the same CUME_DIST() value.

Practical Use Cases

Use Case 1: Customer Segmentation into Value Tiers

Imagine an orders table where you want to segment customers into tiers based on their total spending:

WITH customer_totals AS (
    SELECT
        customer_id,
        SUM(order_amount) AS total_spent
    FROM orders
    GROUP BY customer_id
)
SELECT
    customer_id,
    total_spent,
    NTILE(5) OVER (ORDER BY total_spent DESC) AS value_tier,
    CASE NTILE(5) OVER (ORDER BY total_spent DESC)
        WHEN 1 THEN 'Platinum'
        WHEN 2 THEN 'Gold'
        WHEN 3 THEN 'Silver'
        WHEN 4 THEN 'Bronze'
        WHEN 5 THEN 'Basic'
    END AS tier_label
FROM customer_totals;

This creates five customer tiers based on total spending. Platinum customers (top 20%) might receive premium perks, while Basic customers (bottom 20%) might receive promotional offers to increase engagement.

Use Case 2: Identifying Outliers with PERCENT_RANK

Employees in the top 10% or bottom 10% of salary might warrant special attention:

WITH ranked AS (
    SELECT
        name,
        department,
        salary,
        ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
    FROM employees
)
SELECT
    name,
    department,
    salary,
    pct_rank,
    CASE
        WHEN pct_rank >= 0.90 THEN 'Top 10%'
        WHEN pct_rank <= 0.10 THEN 'Bottom 10%'
        ELSE 'Middle 80%'
    END AS salary_category
FROM ranked;

Output:

name	department	salary	pct_rank	salary_category
Hank	Sales	60000	0.0000	Bottom 10%
Leo	Marketing	62000	0.0909	Bottom 10%
Grace	Sales	65000	0.1818	Middle 80%
Karen	Marketing	66000	0.2727	Middle 80%
Frank	Sales	68000	0.3636	Middle 80%
Jack	Marketing	70000	0.4545	Middle 80%
Eve	Sales	72000	0.5455	Middle 80%
Ivy	Marketing	74000	0.6364	Middle 80%
Diana	Engineering	78000	0.7273	Middle 80%
Charlie	Engineering	82000	0.8182	Middle 80%
Bob	Engineering	88000	0.9091	Top 10%
Alice	Engineering	95000	1.0000	Top 10%

Use Case 3: Grade Curve with CUME_DIST

In educational contexts, CUME_DIST() can assign letter grades based on a curve:

WITH scored AS (
    SELECT
        student_name,
        test_score,
        ROUND(CUME_DIST() OVER (ORDER BY test_score), 4) AS cume_dist
    FROM test_results
)
SELECT
    student_name,
    test_score,
    cume_dist,
    CASE
        WHEN cume_dist <= 0.10 THEN 'F'
        WHEN cume_dist <= 0.30 THEN 'D'
        WHEN cume_dist <= 0.60 THEN 'C'
        WHEN cume_dist <= 0.85 THEN 'B'
        ELSE 'A'
    END AS curved_grade
FROM scored;

The bottom 10% receive an F, the next 20% a D, the next 30% a C, the next 25% a B, and the top 15% an A.

Use Case 4: Median Approximation with NTILE

While SQL does not have a universal MEDIAN() function (some databases offer one as an extension), you can approximate it with NTILE(2):

WITH halved AS (
    SELECT
        salary,
        NTILE(2) OVER (ORDER BY salary) AS half
    FROM employees
)
SELECT
    MAX(CASE WHEN half = 1 THEN salary END) AS lower_median_boundary,
    MIN(CASE WHEN half = 2 THEN salary END) AS upper_median_boundary,
    (MAX(CASE WHEN half = 1 THEN salary END) +
     MIN(CASE WHEN half = 2 THEN salary END)) / 2.0 AS approx_median
FROM halved;

Output:

lower_median_boundary	upper_median_boundary	approx_median
70000	72000	71000.00

This finds the boundary between the lower and upper halves and averages them to approximate the median salary.

All Three Functions Side by Side

This comprehensive query lets you compare all three percentile-related functions at once:

SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS quartile,
    ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank,
    ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM employees;

Output:

name	salary	quartile	pct_rank	cume_dist
Hank	60000	1	0.0000	0.0833
Leo	62000	1	0.0909	0.1667
Grace	65000	1	0.1818	0.2500
Karen	66000	2	0.2727	0.3333
Frank	68000	2	0.3636	0.4167
Jack	70000	2	0.4545	0.5000
Eve	72000	3	0.5455	0.5833
Ivy	74000	3	0.6364	0.6667
Diana	78000	3	0.7273	0.7500
Charlie	82000	4	0.8182	0.8333
Bob	88000	4	0.9091	0.9167
Alice	95000	4	1.0000	1.0000

Notice the relationships:

NTILE(4) creates hard group boundaries (1, 2, 3, 4).
PERCENT_RANK starts at 0 and ends at 1, evenly spaced for unique values.
CUME_DIST starts above 0 (0.0833) and ends at 1, always slightly higher than PERCENT_RANK for the same row.

Common Mistakes to Avoid

Mistake 1: Assuming NTILE Creates Value-Based Buckets

-- NTILE does NOT look at actual values, only row positions
SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

If one quartile spans salaries from 60000 to 65000 and another spans from 82000 to 95000, NTILE() does not care. It distributes rows by count, not by value range. If you need value-based ranges, use a CASE expression:

-- Value-based bucketing
SELECT
    name,
    salary,
    CASE
        WHEN salary < 65000 THEN 1
        WHEN salary < 75000 THEN 2
        WHEN salary < 85000 THEN 3
        ELSE 4
    END AS salary_bracket
FROM employees;

Mistake 2: Confusing PERCENT_RANK and CUME_DIST

These two functions look similar but answer different questions:

-- "What percentage scored BELOW me?"
PERCENT_RANK() OVER (ORDER BY salary)   -- 0.0000 for the lowest

-- "What percentage scored AT OR BELOW me?"
CUME_DIST() OVER (ORDER BY salary)      -- 0.0833 for the lowest

If you use CUME_DIST() when you meant PERCENT_RANK(), your thresholds for filtering (like "top 10%") will be slightly off. With large datasets the difference is negligible, but with small datasets it can matter.

Mistake 3: Using NTILE for Exact Percentile Values

NTILE(100) does not give you the same result as PERCENT_RANK() multiplied by 100:

SELECT
    name,
    salary,
    NTILE(100) OVER (ORDER BY salary) AS ntile_100,
    ROUND(PERCENT_RANK() OVER (ORDER BY salary) * 100, 2) AS pct_rank_100
FROM employees;

Output:

name	salary	ntile_100	pct_rank_100
Hank	60000	1	0.00
Leo	62000	2	9.09
Grace	65000	3	18.18
Karen	66000	4	27.27
Frank	68000	5	36.36
Jack	70000	6	45.45
Eve	72000	7	54.55
Ivy	74000	8	63.64
Diana	78000	9	72.73
Charlie	82000	10	81.82
Bob	88000	11	90.91
Alice	95000	12	100.00

With only 12 rows and 100 buckets, NTILE(100) can only use values 1 through 12. It cannot produce values like 50 or 75. NTILE(100) only approximates true percentiles when you have hundreds or thousands of rows.

Mistake 4: Filtering Directly on Window Functions in WHERE

Like all window functions, NTILE(), PERCENT_RANK(), and CUME_DIST() cannot appear in WHERE:

-- WRONG: Will fail
SELECT name, salary
FROM employees
WHERE NTILE(4) OVER (ORDER BY salary) = 4;

Error: window functions are not allowed in WHERE

Fix: Use a CTE or subquery:

WITH quartiled AS (
    SELECT
        name,
        salary,
        NTILE(4) OVER (ORDER BY salary) AS quartile
    FROM employees
)
SELECT name, salary
FROM quartiled
WHERE quartile = 4;

Output:

name	salary
Charlie	82000
Bob	88000
Alice	95000

Performance Considerations

All three functions require sorting. The database must sort the data by the ORDER BY column within each partition. Indexing the ORDER BY column (and PARTITION BY columns in a composite index) improves performance on large tables.
NTILE is lightweight. Once the data is sorted, assigning bucket numbers is a simple division operation. There is minimal overhead beyond the sort itself.
PERCENT_RANK and CUME_DIST rely on RANK internally. They compute the row's rank first, then apply their formula. The overhead is comparable to RANK().
Combine window definitions when possible. If you use multiple percentile functions with the same OVER() clause, the database can often share a single sort pass:

SELECT
    name,
    salary,
    NTILE(4)       OVER w AS quartile,
    PERCENT_RANK() OVER w AS pct_rank,
    CUME_DIST()    OVER w AS cume_dist
FROM employees
WINDOW w AS (ORDER BY salary);

Summary

SQL NTILE, PERCENT_RANK(), and CUME_DIST() are three window functions purpose-built for distributional analysis and percentile calculations:

NTILE(N) divides ordered rows into N roughly equal buckets, assigning each row a bucket number from 1 to N. Best for quartile analysis, customer segmentation, and tiered classification. Remember that it distributes by row count, not by value range.
PERCENT_RANK() calculates a row's relative position as a value from 0 to 1, representing the percentage of rows that rank below it. Best for percentile rankings and identifying rows in the top or bottom X%.
CUME_DIST() calculates the cumulative distribution, representing the fraction of rows with values at or below the current row's value. Best for distribution analysis and curved grading.

All three require ORDER BY inside OVER(), support PARTITION BY for group-level analysis, and cannot be used in WHERE directly. For filtering, always wrap them in a CTE or subquery.

Quick Decision Guide

Ask yourself what you need:

"Put these rows into N equal groups" → NTILE(N)
"What percentile is this row at?" → PERCENT_RANK()
"What fraction of the data is at or below this value?" → CUME_DIST()

Prerequisites​

The Sample Data​

NTILE(): Dividing Results into Buckets​

Syntax​

Basic Example: Split All Employees into 4 Groups​

How NTILE Handles Uneven Distribution​

NTILE with PARTITION BY​

Quartile Analysis​

Assigning Quartiles​

Quartile Summary Statistics​

Labeling Quartiles for Reports​

Other Common Bucket Sizes​

PERCENT_RANK(): Relative Standing as a Percentage​

Syntax​

Formula​

Basic Example​

Interpreting PERCENT_RANK​

PERCENT_RANK with PARTITION BY​

How PERCENT_RANK Handles Ties​

CUME_DIST(): Cumulative Distribution​

Syntax​

Formula​

Basic Example​

PERCENT_RANK vs CUME_DIST: When to Use Which​

CUME_DIST with Ties​

Practical Use Cases​

Use Case 1: Customer Segmentation into Value Tiers​

Use Case 2: Identifying Outliers with PERCENT_RANK​

Use Case 3: Grade Curve with CUME_DIST​

Use Case 4: Median Approximation with NTILE​

All Three Functions Side by Side​

Common Mistakes to Avoid​

Mistake 1: Assuming NTILE Creates Value-Based Buckets​

Mistake 2: Confusing PERCENT_RANK and CUME_DIST​

Mistake 3: Using NTILE for Exact Percentile Values​

Mistake 4: Filtering Directly on Window Functions in WHERE​

Performance Considerations​

Summary​

Table of Contents

Prerequisites

The Sample Data

NTILE(): Dividing Results into Buckets

Syntax

Basic Example: Split All Employees into 4 Groups

How NTILE Handles Uneven Distribution

NTILE with PARTITION BY

Quartile Analysis

Assigning Quartiles

Quartile Summary Statistics

Labeling Quartiles for Reports

Other Common Bucket Sizes

PERCENT_RANK(): Relative Standing as a Percentage

Syntax

Formula

Basic Example

Interpreting PERCENT_RANK

PERCENT_RANK with PARTITION BY

How PERCENT_RANK Handles Ties

CUME_DIST(): Cumulative Distribution

Syntax

Formula

Basic Example

PERCENT_RANK vs CUME_DIST: When to Use Which

CUME_DIST with Ties

Practical Use Cases

Use Case 1: Customer Segmentation into Value Tiers

Use Case 2: Identifying Outliers with PERCENT_RANK

Use Case 3: Grade Curve with CUME_DIST

Use Case 4: Median Approximation with NTILE

All Three Functions Side by Side

Common Mistakes to Avoid

Mistake 1: Assuming NTILE Creates Value-Based Buckets

Mistake 2: Confusing PERCENT_RANK and CUME_DIST

Mistake 3: Using NTILE for Exact Percentile Values

Mistake 4: Filtering Directly on Window Functions in WHERE

Performance Considerations

Summary