Skip to main content

SQL NTILE() and Percentiles

When analyzing data, you frequently need to divide rows into equal groups, classify values into quartiles, or determine where a specific value falls relative to all others. These are the kinds of questions that power salary band analysis, student grading curves, customer segmentation, and performance benchmarking. In SQL, the tools for this are NTILE(), PERCENT_RANK(), and CUME_DIST().

This guide walks you through each function in detail. You will learn how SQL NTILE splits rows into evenly sized buckets, how PERCENT_RANK() tells you a row's relative standing as a percentage, and how CUME_DIST() calculates cumulative distribution. Every concept is paired with practical examples, clear outputs, and real-world use cases so you can apply them immediately.

Prerequisites

You should be comfortable with SQL window function basics, including the OVER() clause, PARTITION BY, and ORDER BY inside OVER(). All three functions covered here are dedicated window functions. They can only be used with the OVER() clause and have no standalone aggregate equivalent.

The Sample Data

We will use the following employees table throughout this guide:

idnamedepartmentsalary
1AliceEngineering95000
2BobEngineering88000
3CharlieEngineering82000
4DianaEngineering78000
5EveSales72000
6FrankSales68000
7GraceSales65000
8HankSales60000
9IvyMarketing74000
10JackMarketing70000
11KarenMarketing66000
12LeoMarketing62000
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
department VARCHAR(50) NOT NULL,
salary DECIMAL(10,2) NOT NULL
);
INSERT INTO employees (id, name, department, salary) VALUES
(1, 'Alice', 'Engineering', 95000),
(2, 'Bob', 'Engineering', 88000),
(3, 'Charlie', 'Engineering', 82000),
(4, 'Diana', 'Engineering', 78000),
(5, 'Eve', 'Sales', 72000),
(6, 'Frank', 'Sales', 68000),
(7, 'Grace', 'Sales', 65000),
(8, 'Hank', 'Sales', 60000),
(9, 'Ivy', 'Marketing', 74000),
(10, 'Jack', 'Marketing', 70000),
(11, 'Karen', 'Marketing', 66000),
(12, 'Leo', 'Marketing', 62000);

Twelve employees spread across three departments, each with four employees. The salaries are deliberately unique to make the examples easier to follow.

NTILE(): Dividing Results into Buckets

NTILE() distributes rows in an ordered partition into a specified number of roughly equal groups (called buckets or tiles). Each row is assigned a bucket number from 1 to N, where N is the number you pass to the function.

Syntax

NTILE(number_of_buckets) OVER (
[PARTITION BY column1, ...]
ORDER BY column2 [ASC|DESC]
)
ParameterRequiredDescription
number_of_bucketsYesThe number of groups to divide the rows into
PARTITION BYNoDivides rows into independent groups before bucketing
ORDER BYYesDetermines the order in which rows are assigned to buckets
info

ORDER BY inside OVER() is required for NTILE() to produce meaningful results. Without it, the assignment of rows to buckets is arbitrary and non-deterministic.

Basic Example: Split All Employees into 4 Groups

SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

Output:

namesalaryquartile
Alice95000.001
Bob88000.001
Charlie82000.001
Diana78000.002
Ivy74000.002
Eve72000.002
Jack70000.003
Frank68000.003
Karen66000.003
Grace65000.004
Leo62000.004
Hank60000.004

We have 12 rows and requested 4 buckets, so each bucket gets exactly 3 rows (12 / 4 = 3). The employees are sorted by salary descending, so bucket 1 contains the top earners and bucket 4 contains the lowest earners.

How NTILE Handles Uneven Distribution

What happens when the number of rows does not divide evenly into buckets? SQL NTILE distributes the remainder rows across the first buckets, giving them one extra row each.

Let's use only the first 10 employees and split them into 3 buckets:

SELECT
name,
salary,
NTILE(3) OVER (ORDER BY salary DESC) AS bucket
FROM employees
WHERE id <= 10;

Output:

namesalarybucket
Alice95000.001
Bob88000.001
Charlie82000.001
Diana78000.001
Ivy74000.002
Eve72000.002
Jack70000.002
Frank68000.003
Grace65000.003
Hank60000.003

With 10 rows and 3 buckets: 10 / 3 = 3 with a remainder of 1. The first bucket gets an extra row, resulting in bucket sizes of 4, 3, 3.

If the remainder were 2, the first two buckets would each get an extra row. The rule is simple: larger buckets always come first.

Total RowsBucketsDistribution
1243, 3, 3, 3
1034, 3, 3
1143, 3, 3, 2
733, 2, 2
542, 1, 1, 1
caution

NTILE() distributes rows as evenly as possible but it does not consider the actual values. If you have 100 employees and 95 of them earn between 50000 and 55000, while 5 earn over 200000, NTILE(4) will still put 25 employees in each bucket. The top bucket will mix vastly different salary ranges. For value-based bucketing, use CASE expressions with explicit thresholds or WIDTH_BUCKET() (where supported).

NTILE with PARTITION BY

When you add PARTITION BY, SQL NTILE restarts the bucketing independently within each partition:

SELECT
name,
department,
salary,
NTILE(2) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_half
FROM employees;

Output:

namedepartmentsalarysalary_half
AliceEngineering95000.001
BobEngineering88000.001
CharlieEngineering82000.002
DianaEngineering78000.002
IvyMarketing74000.001
JackMarketing70000.001
KarenMarketing66000.002
LeoMarketing62000.002
EveSales72000.001
FrankSales68000.001
GraceSales65000.002
HankSales60000.002

Each department is independently split into two halves: the upper half (salary_half = 1) and the lower half (salary_half = 2). This is useful for comparing the top performers against the bottom performers within each team.

Quartile Analysis

A quartile divides data into four equal groups. This is one of the most common applications of NTILE() in real-world analytics, used extensively in salary analysis, student performance, and customer segmentation.

Assigning Quartiles

SELECT
name,
department,
salary,
NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;

Output:

namedepartmentsalarysalary_quartile
HankSales60000.001
LeoMarketing62000.001
GraceSales65000.001
KarenMarketing66000.002
FrankSales68000.002
JackMarketing70000.002
EveSales72000.003
IvyMarketing74000.003
DianaEngineering78000.003
CharlieEngineering82000.004
BobEngineering88000.004
AliceEngineering95000.004

Notice the ORDER BY salary is ascending here (the default). Quartile 1 contains the lowest salaries, and quartile 4 contains the highest. This is the standard statistical convention where Q1 is the bottom 25%.

Quartile Summary Statistics

You can combine NTILE() with aggregate functions to create a quartile summary report:

WITH quartiled AS (
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees
)
SELECT
quartile,
COUNT(*) AS employee_count,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary,
ROUND(AVG(salary), 2) AS avg_salary
FROM quartiled
GROUP BY quartile
ORDER BY quartile;

Output:

quartileemployee_countmin_salarymax_salaryavg_salary
1360000.0065000.0062333.33
2366000.0070000.0068000.00
3372000.0078000.0074666.67
4382000.0095000.0088333.33

This gives a clean picture of salary distribution across the company. The gap between Q3 and Q4 (74666 to 88333) is much larger than the gap between Q1 and Q2 (62333 to 68000), indicating salary skew toward the top.

Labeling Quartiles for Reports

Instead of showing raw numbers, you can map quartile numbers to meaningful labels:

SELECT
name,
department,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile,
CASE NTILE(4) OVER (ORDER BY salary)
WHEN 1 THEN 'Bottom 25%'
WHEN 2 THEN 'Lower-Middle 25%'
WHEN 3 THEN 'Upper-Middle 25%'
WHEN 4 THEN 'Top 25%'
END AS salary_band
FROM employees;

Output:

namedepartmentsalaryquartilesalary_band
HankSales60000.001Bottom 25%
LeoMarketing62000.001Bottom 25%
GraceSales65000.001Bottom 25%
KarenMarketing66000.002Lower-Middle 25%
FrankSales68000.002Lower-Middle 25%
JackMarketing70000.002Lower-Middle 25%
EveSales72000.003Upper-Middle 25%
IvyMarketing74000.003Upper-Middle 25%
DianaEngineering78000.003Upper-Middle 25%
CharlieEngineering82000.004Top 25%
BobEngineering88000.004Top 25%
AliceEngineering95000.004Top 25%

Other Common Bucket Sizes

NTILE() is not limited to quartiles. You can divide into any number of buckets:

Bucket CountStatistical TermUse Case
NTILE(2)Median splitAbove/below median
NTILE(3)TercilesLow/medium/high segments
NTILE(4)QuartilesStandard quartile analysis
NTILE(5)QuintilesCustomer value tiers
NTILE(10)DecilesDetailed distribution analysis
NTILE(100)PercentilesFine-grained percentile ranking
-- Decile analysis
SELECT
name,
salary,
NTILE(10) OVER (ORDER BY salary) AS decile
FROM employees;

Output:

namesalarydecile
Hank60000.001
Leo62000.001
Grace65000.002
Karen66000.002
Frank68000.003
Jack70000.004
Eve72000.005
Ivy74000.006
Diana78000.007
Charlie82000.008
Bob88000.009
Alice95000.0010

With 12 rows and 10 buckets, the first 2 buckets get 2 rows each (12 / 10 = 1 remainder 2), and the remaining 8 buckets get 1 row each. Deciles 10 has 1 row only in a perfect distribution, but here the remainder logic pushes some extra rows into the early buckets, and the last few end up sharing. As you can see, decile 10 ended up with more rows. This is the inherent limitation of NTILE() with small datasets.

tip

NTILE() works best with large datasets where the number of rows significantly exceeds the number of buckets. With 12 rows and 10 buckets, the distribution is awkward. With 10000 rows and 10 buckets, each bucket gets exactly 1000 rows, which is clean and statistically meaningful.

PERCENT_RANK(): Relative Standing as a Percentage

PERCENT_RANK() calculates a row's relative rank as a value between 0 and 1 (or 0% to 100%). It answers the question: "What percentage of rows have a value less than this row's value?"

Syntax

PERCENT_RANK() OVER (
[PARTITION BY column1, ...]
ORDER BY column2 [ASC|DESC]
)

PERCENT_RANK() takes no arguments. It relies entirely on the ORDER BY inside OVER().

Formula

PERCENT_RANK = (rank - 1) / (total_rows - 1)

Where rank is the row's RANK() value and total_rows is the number of rows in the partition.

  • The first row (rank 1) always gets a PERCENT_RANK of 0.
  • The last row always gets a PERCENT_RANK of 1.

Basic Example

SELECT
name,
salary,
RANK() OVER (ORDER BY salary) AS rnk,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM employees;

Output:

namesalaryrnkpct_rank
Hank60000.0010
Leo62000.0020.0909
Grace65000.0030.1818
Karen66000.0040.2727
Frank68000.0050.3636
Jack70000.0060.4545
Eve72000.0070.5455
Ivy74000.0080.6364
Diana78000.0090.7273
Charlie82000.00100.8182
Bob88000.00110.9091
Alice95000.00121

Let's verify the formula for Leo (rank 2):

PERCENT_RANK = (2 - 1) / (12 - 1) = 1 / 11 = 0.0909

Alice (rank 12) gets exactly 1.0, and Hank (rank 1) gets exactly 0.0.

Interpreting PERCENT_RANK

A PERCENT_RANK of 0.7273 for Diana means that approximately 72.73% of employees earn less than her. Conversely, she earns more than roughly 72.73% of the workforce.

PERCENT_RANK with PARTITION BY

SELECT
name,
department,
salary,
ROUND(PERCENT_RANK() OVER (
PARTITION BY department
ORDER BY salary
), 4) AS dept_pct_rank
FROM employees;

Output:

namedepartmentsalarydept_pct_rank
DianaEngineering78000.000
CharlieEngineering82000.000.3333
BobEngineering88000.000.6667
AliceEngineering95000.001
LeoMarketing62000.000
KarenMarketing66000.000.3333
JackMarketing70000.000.6667
IvyMarketing74000.001
HankSales60000.000
GraceSales65000.000.3333
FrankSales68000.000.6667
EveSales72000.001

Each department is evaluated independently. Within Engineering, Diana is at the bottom (0.0) and Alice is at the top (1.0). The formula uses each partition's own row count (4 in each department): (rank - 1) / (4 - 1).

How PERCENT_RANK Handles Ties

When rows share the same value, they receive the same RANK(), which means they also receive the same PERCENT_RANK(). Let's add a tie to see:

-- Simulating a tie with a modified dataset
SELECT
name,
salary,
RANK() OVER (ORDER BY salary) AS rnk,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM (
VALUES ('Hank', 60000), ('Leo', 62000), ('Grace', 65000),
('Karen', 65000), ('Frank', 68000)
) AS t(name, salary);

Output:

namesalaryrnkpct_rank
Hank6000010.0000
Leo6200020.2500
Karen6500030.5000
Grace6500030.5000
Frank6800051.0000

Grace and Karen are tied at rank 3, so they both get the same PERCENT_RANK of 0.5. Rank 4 is skipped (standard RANK() behavior), and Frank jumps to rank 5 with a PERCENT_RANK of 1.0.

CUME_DIST(): Cumulative Distribution

CUME_DIST() (cumulative distribution) calculates the proportion of rows that have a value less than or equal to the current row's value. It also returns a value between 0 and 1, but its formula and behavior differ from PERCENT_RANK().

Syntax

CUME_DIST() OVER (
[PARTITION BY column1, ...]
ORDER BY column2 [ASC|DESC]
)

Like PERCENT_RANK(), CUME_DIST() takes no arguments.

Formula

CUME_DIST = (number of rows with value <= current row's value) / (total rows)

Key differences from PERCENT_RANK():

  • CUME_DIST() is always greater than 0 (never exactly 0)
  • The last row (or rows tied for last) always gets 1.0
  • CUME_DIST() answers: "What fraction of all rows are at or below this value?"

Basic Example

SELECT
name,
salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM employees;

Output:

namesalarypct_rankcume_dist
Hank600000.00000.0833
Leo620000.09090.1667
Grace650000.18180.2500
Karen660000.27270.3333
Frank680000.36360.4167
Jack700000.45450.5000
Eve720000.54550.5833
Ivy740000.63640.6667
Diana780000.72730.7500
Charlie820000.81820.8333
Bob880000.90910.9167
Alice950001.00001.0000

Let's verify for Hank: 1 row has a salary <= 60000 (just Hank himself), and there are 12 total rows. CUME_DIST = 1 / 12 = 0.0833.

For Grace: 3 rows have salaries <= 65000 (Hank, Leo, Grace). CUME_DIST = 3 / 12 = 0.2500.

PERCENT_RANK vs CUME_DIST: When to Use Which

AspectPERCENT_RANKCUME_DIST
Formula(rank - 1) / (total - 1)rows <= current / total
Range0 to 1Greater than 0, up to 1
First row valueAlways 0Always > 0
Last row valueAlways 1Always 1
Interpretation% of rows below this one% of rows at or below this one
Analogy"I scored better than X% of people""X% of people scored at or below me"
When to use which?
  • Use PERCENT_RANK() when you want to know a row's relative position within the group (like a percentile rank on a standardized test: "You scored in the 85th percentile").
  • Use CUME_DIST() when you want to know what fraction of the data falls at or below a certain value (like "75% of employees earn this amount or less").

CUME_DIST with Ties

When rows are tied, CUME_DIST() includes all tied rows in the numerator:

SELECT
name,
salary,
ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM (
VALUES ('Hank', 60000), ('Leo', 62000), ('Grace', 65000),
('Karen', 65000), ('Frank', 68000)
) AS t(name, salary);

Output:

namesalarycume_dist
Hank600000.2000
Leo620000.4000
Grace650000.8000
Karen650000.8000
Frank680001.0000

Grace and Karen both show 0.8000 because 4 out of 5 rows have salaries <= 65000. All tied rows receive the same CUME_DIST() value.

Practical Use Cases

Use Case 1: Customer Segmentation into Value Tiers

Imagine an orders table where you want to segment customers into tiers based on their total spending:

WITH customer_totals AS (
SELECT
customer_id,
SUM(order_amount) AS total_spent
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
total_spent,
NTILE(5) OVER (ORDER BY total_spent DESC) AS value_tier,
CASE NTILE(5) OVER (ORDER BY total_spent DESC)
WHEN 1 THEN 'Platinum'
WHEN 2 THEN 'Gold'
WHEN 3 THEN 'Silver'
WHEN 4 THEN 'Bronze'
WHEN 5 THEN 'Basic'
END AS tier_label
FROM customer_totals;

This creates five customer tiers based on total spending. Platinum customers (top 20%) might receive premium perks, while Basic customers (bottom 20%) might receive promotional offers to increase engagement.

Use Case 2: Identifying Outliers with PERCENT_RANK

Employees in the top 10% or bottom 10% of salary might warrant special attention:

WITH ranked AS (
SELECT
name,
department,
salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM employees
)
SELECT
name,
department,
salary,
pct_rank,
CASE
WHEN pct_rank >= 0.90 THEN 'Top 10%'
WHEN pct_rank <= 0.10 THEN 'Bottom 10%'
ELSE 'Middle 80%'
END AS salary_category
FROM ranked;

Output:

namedepartmentsalarypct_ranksalary_category
HankSales600000.0000Bottom 10%
LeoMarketing620000.0909Bottom 10%
GraceSales650000.1818Middle 80%
KarenMarketing660000.2727Middle 80%
FrankSales680000.3636Middle 80%
JackMarketing700000.4545Middle 80%
EveSales720000.5455Middle 80%
IvyMarketing740000.6364Middle 80%
DianaEngineering780000.7273Middle 80%
CharlieEngineering820000.8182Middle 80%
BobEngineering880000.9091Top 10%
AliceEngineering950001.0000Top 10%

Use Case 3: Grade Curve with CUME_DIST

In educational contexts, CUME_DIST() can assign letter grades based on a curve:

WITH scored AS (
SELECT
student_name,
test_score,
ROUND(CUME_DIST() OVER (ORDER BY test_score), 4) AS cume_dist
FROM test_results
)
SELECT
student_name,
test_score,
cume_dist,
CASE
WHEN cume_dist <= 0.10 THEN 'F'
WHEN cume_dist <= 0.30 THEN 'D'
WHEN cume_dist <= 0.60 THEN 'C'
WHEN cume_dist <= 0.85 THEN 'B'
ELSE 'A'
END AS curved_grade
FROM scored;

The bottom 10% receive an F, the next 20% a D, the next 30% a C, the next 25% a B, and the top 15% an A.

Use Case 4: Median Approximation with NTILE

While SQL does not have a universal MEDIAN() function (some databases offer one as an extension), you can approximate it with NTILE(2):

WITH halved AS (
SELECT
salary,
NTILE(2) OVER (ORDER BY salary) AS half
FROM employees
)
SELECT
MAX(CASE WHEN half = 1 THEN salary END) AS lower_median_boundary,
MIN(CASE WHEN half = 2 THEN salary END) AS upper_median_boundary,
(MAX(CASE WHEN half = 1 THEN salary END) +
MIN(CASE WHEN half = 2 THEN salary END)) / 2.0 AS approx_median
FROM halved;

Output:

lower_median_boundaryupper_median_boundaryapprox_median
700007200071000.00

This finds the boundary between the lower and upper halves and averages them to approximate the median salary.

All Three Functions Side by Side

This comprehensive query lets you compare all three percentile-related functions at once:

SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM employees;

Output:

namesalaryquartilepct_rankcume_dist
Hank6000010.00000.0833
Leo6200010.09090.1667
Grace6500010.18180.2500
Karen6600020.27270.3333
Frank6800020.36360.4167
Jack7000020.45450.5000
Eve7200030.54550.5833
Ivy7400030.63640.6667
Diana7800030.72730.7500
Charlie8200040.81820.8333
Bob8800040.90910.9167
Alice9500041.00001.0000

Notice the relationships:

  • NTILE(4) creates hard group boundaries (1, 2, 3, 4).
  • PERCENT_RANK starts at 0 and ends at 1, evenly spaced for unique values.
  • CUME_DIST starts above 0 (0.0833) and ends at 1, always slightly higher than PERCENT_RANK for the same row.

Common Mistakes to Avoid

Mistake 1: Assuming NTILE Creates Value-Based Buckets

-- NTILE does NOT look at actual values, only row positions
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

If one quartile spans salaries from 60000 to 65000 and another spans from 82000 to 95000, NTILE() does not care. It distributes rows by count, not by value range. If you need value-based ranges, use a CASE expression:

-- Value-based bucketing
SELECT
name,
salary,
CASE
WHEN salary < 65000 THEN 1
WHEN salary < 75000 THEN 2
WHEN salary < 85000 THEN 3
ELSE 4
END AS salary_bracket
FROM employees;

Mistake 2: Confusing PERCENT_RANK and CUME_DIST

These two functions look similar but answer different questions:

-- "What percentage scored BELOW me?"
PERCENT_RANK() OVER (ORDER BY salary) -- 0.0000 for the lowest

-- "What percentage scored AT OR BELOW me?"
CUME_DIST() OVER (ORDER BY salary) -- 0.0833 for the lowest

If you use CUME_DIST() when you meant PERCENT_RANK(), your thresholds for filtering (like "top 10%") will be slightly off. With large datasets the difference is negligible, but with small datasets it can matter.

Mistake 3: Using NTILE for Exact Percentile Values

NTILE(100) does not give you the same result as PERCENT_RANK() multiplied by 100:

SELECT
name,
salary,
NTILE(100) OVER (ORDER BY salary) AS ntile_100,
ROUND(PERCENT_RANK() OVER (ORDER BY salary) * 100, 2) AS pct_rank_100
FROM employees;

Output:

namesalaryntile_100pct_rank_100
Hank6000010.00
Leo6200029.09
Grace65000318.18
Karen66000427.27
Frank68000536.36
Jack70000645.45
Eve72000754.55
Ivy74000863.64
Diana78000972.73
Charlie820001081.82
Bob880001190.91
Alice9500012100.00

With only 12 rows and 100 buckets, NTILE(100) can only use values 1 through 12. It cannot produce values like 50 or 75. NTILE(100) only approximates true percentiles when you have hundreds or thousands of rows.

Mistake 4: Filtering Directly on Window Functions in WHERE

Like all window functions, NTILE(), PERCENT_RANK(), and CUME_DIST() cannot appear in WHERE:

-- WRONG: Will fail
SELECT name, salary
FROM employees
WHERE NTILE(4) OVER (ORDER BY salary) = 4;

Error: window functions are not allowed in WHERE

Fix: Use a CTE or subquery:

WITH quartiled AS (
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees
)
SELECT name, salary
FROM quartiled
WHERE quartile = 4;

Output:

namesalary
Charlie82000
Bob88000
Alice95000

Performance Considerations

  • All three functions require sorting. The database must sort the data by the ORDER BY column within each partition. Indexing the ORDER BY column (and PARTITION BY columns in a composite index) improves performance on large tables.

  • NTILE is lightweight. Once the data is sorted, assigning bucket numbers is a simple division operation. There is minimal overhead beyond the sort itself.

  • PERCENT_RANK and CUME_DIST rely on RANK internally. They compute the row's rank first, then apply their formula. The overhead is comparable to RANK().

  • Combine window definitions when possible. If you use multiple percentile functions with the same OVER() clause, the database can often share a single sort pass:

SELECT
name,
salary,
NTILE(4) OVER w AS quartile,
PERCENT_RANK() OVER w AS pct_rank,
CUME_DIST() OVER w AS cume_dist
FROM employees
WINDOW w AS (ORDER BY salary);

Summary

SQL NTILE, PERCENT_RANK(), and CUME_DIST() are three window functions purpose-built for distributional analysis and percentile calculations:

  • NTILE(N) divides ordered rows into N roughly equal buckets, assigning each row a bucket number from 1 to N. Best for quartile analysis, customer segmentation, and tiered classification. Remember that it distributes by row count, not by value range.

  • PERCENT_RANK() calculates a row's relative position as a value from 0 to 1, representing the percentage of rows that rank below it. Best for percentile rankings and identifying rows in the top or bottom X%.

  • CUME_DIST() calculates the cumulative distribution, representing the fraction of rows with values at or below the current row's value. Best for distribution analysis and curved grading.

All three require ORDER BY inside OVER(), support PARTITION BY for group-level analysis, and cannot be used in WHERE directly. For filtering, always wrap them in a CTE or subquery.

Quick Decision Guide

Ask yourself what you need:

  • "Put these rows into N equal groups"NTILE(N)
  • "What percentile is this row at?"PERCENT_RANK()
  • "What fraction of the data is at or below this value?"CUME_DIST()