SQL NTILE() and Percentiles
When analyzing data, you frequently need to divide rows into equal groups, classify values into quartiles, or determine where a specific value falls relative to all others. These are the kinds of questions that power salary band analysis, student grading curves, customer segmentation, and performance benchmarking. In SQL, the tools for this are NTILE(), PERCENT_RANK(), and CUME_DIST().
This guide walks you through each function in detail. You will learn how SQL NTILE splits rows into evenly sized buckets, how PERCENT_RANK() tells you a row's relative standing as a percentage, and how CUME_DIST() calculates cumulative distribution. Every concept is paired with practical examples, clear outputs, and real-world use cases so you can apply them immediately.
Prerequisites
You should be comfortable with SQL window function basics, including the OVER() clause, PARTITION BY, and ORDER BY inside OVER(). All three functions covered here are dedicated window functions. They can only be used with the OVER() clause and have no standalone aggregate equivalent.
The Sample Data
We will use the following employees table throughout this guide:
| id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 95000 |
| 2 | Bob | Engineering | 88000 |
| 3 | Charlie | Engineering | 82000 |
| 4 | Diana | Engineering | 78000 |
| 5 | Eve | Sales | 72000 |
| 6 | Frank | Sales | 68000 |
| 7 | Grace | Sales | 65000 |
| 8 | Hank | Sales | 60000 |
| 9 | Ivy | Marketing | 74000 |
| 10 | Jack | Marketing | 70000 |
| 11 | Karen | Marketing | 66000 |
| 12 | Leo | Marketing | 62000 |
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
department VARCHAR(50) NOT NULL,
salary DECIMAL(10,2) NOT NULL
);
INSERT INTO employees (id, name, department, salary) VALUES
(1, 'Alice', 'Engineering', 95000),
(2, 'Bob', 'Engineering', 88000),
(3, 'Charlie', 'Engineering', 82000),
(4, 'Diana', 'Engineering', 78000),
(5, 'Eve', 'Sales', 72000),
(6, 'Frank', 'Sales', 68000),
(7, 'Grace', 'Sales', 65000),
(8, 'Hank', 'Sales', 60000),
(9, 'Ivy', 'Marketing', 74000),
(10, 'Jack', 'Marketing', 70000),
(11, 'Karen', 'Marketing', 66000),
(12, 'Leo', 'Marketing', 62000);
Twelve employees spread across three departments, each with four employees. The salaries are deliberately unique to make the examples easier to follow.
NTILE(): Dividing Results into Buckets
NTILE() distributes rows in an ordered partition into a specified number of roughly equal groups (called buckets or tiles). Each row is assigned a bucket number from 1 to N, where N is the number you pass to the function.
Syntax
NTILE(number_of_buckets) OVER (
[PARTITION BY column1, ...]
ORDER BY column2 [ASC|DESC]
)
| Parameter | Required | Description |
|---|---|---|
number_of_buckets | Yes | The number of groups to divide the rows into |
PARTITION BY | No | Divides rows into independent groups before bucketing |
ORDER BY | Yes | Determines the order in which rows are assigned to buckets |
ORDER BY inside OVER() is required for NTILE() to produce meaningful results. Without it, the assignment of rows to buckets is arbitrary and non-deterministic.
Basic Example: Split All Employees into 4 Groups
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
Output:
| name | salary | quartile |
|---|---|---|
| Alice | 95000.00 | 1 |
| Bob | 88000.00 | 1 |
| Charlie | 82000.00 | 1 |
| Diana | 78000.00 | 2 |
| Ivy | 74000.00 | 2 |
| Eve | 72000.00 | 2 |
| Jack | 70000.00 | 3 |
| Frank | 68000.00 | 3 |
| Karen | 66000.00 | 3 |
| Grace | 65000.00 | 4 |
| Leo | 62000.00 | 4 |
| Hank | 60000.00 | 4 |
We have 12 rows and requested 4 buckets, so each bucket gets exactly 3 rows (12 / 4 = 3). The employees are sorted by salary descending, so bucket 1 contains the top earners and bucket 4 contains the lowest earners.
How NTILE Handles Uneven Distribution
What happens when the number of rows does not divide evenly into buckets? SQL NTILE distributes the remainder rows across the first buckets, giving them one extra row each.
Let's use only the first 10 employees and split them into 3 buckets:
SELECT
name,
salary,
NTILE(3) OVER (ORDER BY salary DESC) AS bucket
FROM employees
WHERE id <= 10;
Output:
| name | salary | bucket |
|---|---|---|
| Alice | 95000.00 | 1 |
| Bob | 88000.00 | 1 |
| Charlie | 82000.00 | 1 |
| Diana | 78000.00 | 1 |
| Ivy | 74000.00 | 2 |
| Eve | 72000.00 | 2 |
| Jack | 70000.00 | 2 |
| Frank | 68000.00 | 3 |
| Grace | 65000.00 | 3 |
| Hank | 60000.00 | 3 |
With 10 rows and 3 buckets: 10 / 3 = 3 with a remainder of 1. The first bucket gets an extra row, resulting in bucket sizes of 4, 3, 3.
If the remainder were 2, the first two buckets would each get an extra row. The rule is simple: larger buckets always come first.
| Total Rows | Buckets | Distribution |
|---|---|---|
| 12 | 4 | 3, 3, 3, 3 |
| 10 | 3 | 4, 3, 3 |
| 11 | 4 | 3, 3, 3, 2 |
| 7 | 3 | 3, 2, 2 |
| 5 | 4 | 2, 1, 1, 1 |
NTILE() distributes rows as evenly as possible but it does not consider the actual values. If you have 100 employees and 95 of them earn between 50000 and 55000, while 5 earn over 200000, NTILE(4) will still put 25 employees in each bucket. The top bucket will mix vastly different salary ranges. For value-based bucketing, use CASE expressions with explicit thresholds or WIDTH_BUCKET() (where supported).
NTILE with PARTITION BY
When you add PARTITION BY, SQL NTILE restarts the bucketing independently within each partition:
SELECT
name,
department,
salary,
NTILE(2) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_half
FROM employees;
Output:
| name | department | salary | salary_half |
|---|---|---|---|
| Alice | Engineering | 95000.00 | 1 |
| Bob | Engineering | 88000.00 | 1 |
| Charlie | Engineering | 82000.00 | 2 |
| Diana | Engineering | 78000.00 | 2 |
| Ivy | Marketing | 74000.00 | 1 |
| Jack | Marketing | 70000.00 | 1 |
| Karen | Marketing | 66000.00 | 2 |
| Leo | Marketing | 62000.00 | 2 |
| Eve | Sales | 72000.00 | 1 |
| Frank | Sales | 68000.00 | 1 |
| Grace | Sales | 65000.00 | 2 |
| Hank | Sales | 60000.00 | 2 |
Each department is independently split into two halves: the upper half (salary_half = 1) and the lower half (salary_half = 2). This is useful for comparing the top performers against the bottom performers within each team.
Quartile Analysis
A quartile divides data into four equal groups. This is one of the most common applications of NTILE() in real-world analytics, used extensively in salary analysis, student performance, and customer segmentation.
Assigning Quartiles
SELECT
name,
department,
salary,
NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;
Output:
| name | department | salary | salary_quartile |
|---|---|---|---|
| Hank | Sales | 60000.00 | 1 |
| Leo | Marketing | 62000.00 | 1 |
| Grace | Sales | 65000.00 | 1 |
| Karen | Marketing | 66000.00 | 2 |
| Frank | Sales | 68000.00 | 2 |
| Jack | Marketing | 70000.00 | 2 |
| Eve | Sales | 72000.00 | 3 |
| Ivy | Marketing | 74000.00 | 3 |
| Diana | Engineering | 78000.00 | 3 |
| Charlie | Engineering | 82000.00 | 4 |
| Bob | Engineering | 88000.00 | 4 |
| Alice | Engineering | 95000.00 | 4 |
Notice the ORDER BY salary is ascending here (the default). Quartile 1 contains the lowest salaries, and quartile 4 contains the highest. This is the standard statistical convention where Q1 is the bottom 25%.
Quartile Summary Statistics
You can combine NTILE() with aggregate functions to create a quartile summary report:
WITH quartiled AS (
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees
)
SELECT
quartile,
COUNT(*) AS employee_count,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary,
ROUND(AVG(salary), 2) AS avg_salary
FROM quartiled
GROUP BY quartile
ORDER BY quartile;
Output:
| quartile | employee_count | min_salary | max_salary | avg_salary |
|---|---|---|---|---|
| 1 | 3 | 60000.00 | 65000.00 | 62333.33 |
| 2 | 3 | 66000.00 | 70000.00 | 68000.00 |
| 3 | 3 | 72000.00 | 78000.00 | 74666.67 |
| 4 | 3 | 82000.00 | 95000.00 | 88333.33 |
This gives a clean picture of salary distribution across the company. The gap between Q3 and Q4 (74666 to 88333) is much larger than the gap between Q1 and Q2 (62333 to 68000), indicating salary skew toward the top.
Labeling Quartiles for Reports
Instead of showing raw numbers, you can map quartile numbers to meaningful labels:
SELECT
name,
department,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile,
CASE NTILE(4) OVER (ORDER BY salary)
WHEN 1 THEN 'Bottom 25%'
WHEN 2 THEN 'Lower-Middle 25%'
WHEN 3 THEN 'Upper-Middle 25%'
WHEN 4 THEN 'Top 25%'
END AS salary_band
FROM employees;
Output:
| name | department | salary | quartile | salary_band |
|---|---|---|---|---|
| Hank | Sales | 60000.00 | 1 | Bottom 25% |
| Leo | Marketing | 62000.00 | 1 | Bottom 25% |
| Grace | Sales | 65000.00 | 1 | Bottom 25% |
| Karen | Marketing | 66000.00 | 2 | Lower-Middle 25% |
| Frank | Sales | 68000.00 | 2 | Lower-Middle 25% |
| Jack | Marketing | 70000.00 | 2 | Lower-Middle 25% |
| Eve | Sales | 72000.00 | 3 | Upper-Middle 25% |
| Ivy | Marketing | 74000.00 | 3 | Upper-Middle 25% |
| Diana | Engineering | 78000.00 | 3 | Upper-Middle 25% |
| Charlie | Engineering | 82000.00 | 4 | Top 25% |
| Bob | Engineering | 88000.00 | 4 | Top 25% |
| Alice | Engineering | 95000.00 | 4 | Top 25% |
Other Common Bucket Sizes
NTILE() is not limited to quartiles. You can divide into any number of buckets:
| Bucket Count | Statistical Term | Use Case |
|---|---|---|
NTILE(2) | Median split | Above/below median |
NTILE(3) | Terciles | Low/medium/high segments |
NTILE(4) | Quartiles | Standard quartile analysis |
NTILE(5) | Quintiles | Customer value tiers |
NTILE(10) | Deciles | Detailed distribution analysis |
NTILE(100) | Percentiles | Fine-grained percentile ranking |
-- Decile analysis
SELECT
name,
salary,
NTILE(10) OVER (ORDER BY salary) AS decile
FROM employees;
Output:
| name | salary | decile |
|---|---|---|
| Hank | 60000.00 | 1 |
| Leo | 62000.00 | 1 |
| Grace | 65000.00 | 2 |
| Karen | 66000.00 | 2 |
| Frank | 68000.00 | 3 |
| Jack | 70000.00 | 4 |
| Eve | 72000.00 | 5 |
| Ivy | 74000.00 | 6 |
| Diana | 78000.00 | 7 |
| Charlie | 82000.00 | 8 |
| Bob | 88000.00 | 9 |
| Alice | 95000.00 | 10 |
With 12 rows and 10 buckets, the first 2 buckets get 2 rows each (12 / 10 = 1 remainder 2), and the remaining 8 buckets get 1 row each. Deciles 10 has 1 row only in a perfect distribution, but here the remainder logic pushes some extra rows into the early buckets, and the last few end up sharing. As you can see, decile 10 ended up with more rows. This is the inherent limitation of NTILE() with small datasets.
NTILE() works best with large datasets where the number of rows significantly exceeds the number of buckets. With 12 rows and 10 buckets, the distribution is awkward. With 10000 rows and 10 buckets, each bucket gets exactly 1000 rows, which is clean and statistically meaningful.
PERCENT_RANK(): Relative Standing as a Percentage
PERCENT_RANK() calculates a row's relative rank as a value between 0 and 1 (or 0% to 100%). It answers the question: "What percentage of rows have a value less than this row's value?"
Syntax
PERCENT_RANK() OVER (
[PARTITION BY column1, ...]
ORDER BY column2 [ASC|DESC]
)
PERCENT_RANK() takes no arguments. It relies entirely on the ORDER BY inside OVER().
Formula
PERCENT_RANK = (rank - 1) / (total_rows - 1)
Where rank is the row's RANK() value and total_rows is the number of rows in the partition.
- The first row (rank 1) always gets a
PERCENT_RANKof 0. - The last row always gets a
PERCENT_RANKof 1.
Basic Example
SELECT
name,
salary,
RANK() OVER (ORDER BY salary) AS rnk,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM employees;
Output:
| name | salary | rnk | pct_rank |
|---|---|---|---|
| Hank | 60000.00 | 1 | 0 |
| Leo | 62000.00 | 2 | 0.0909 |
| Grace | 65000.00 | 3 | 0.1818 |
| Karen | 66000.00 | 4 | 0.2727 |
| Frank | 68000.00 | 5 | 0.3636 |
| Jack | 70000.00 | 6 | 0.4545 |
| Eve | 72000.00 | 7 | 0.5455 |
| Ivy | 74000.00 | 8 | 0.6364 |
| Diana | 78000.00 | 9 | 0.7273 |
| Charlie | 82000.00 | 10 | 0.8182 |
| Bob | 88000.00 | 11 | 0.9091 |
| Alice | 95000.00 | 12 | 1 |
Let's verify the formula for Leo (rank 2):
PERCENT_RANK = (2 - 1) / (12 - 1) = 1 / 11 = 0.0909
Alice (rank 12) gets exactly 1.0, and Hank (rank 1) gets exactly 0.0.
Interpreting PERCENT_RANK
A PERCENT_RANK of 0.7273 for Diana means that approximately 72.73% of employees earn less than her. Conversely, she earns more than roughly 72.73% of the workforce.
PERCENT_RANK with PARTITION BY
SELECT
name,
department,
salary,
ROUND(PERCENT_RANK() OVER (
PARTITION BY department
ORDER BY salary
), 4) AS dept_pct_rank
FROM employees;
Output:
| name | department | salary | dept_pct_rank |
|---|---|---|---|
| Diana | Engineering | 78000.00 | 0 |
| Charlie | Engineering | 82000.00 | 0.3333 |
| Bob | Engineering | 88000.00 | 0.6667 |
| Alice | Engineering | 95000.00 | 1 |
| Leo | Marketing | 62000.00 | 0 |
| Karen | Marketing | 66000.00 | 0.3333 |
| Jack | Marketing | 70000.00 | 0.6667 |
| Ivy | Marketing | 74000.00 | 1 |
| Hank | Sales | 60000.00 | 0 |
| Grace | Sales | 65000.00 | 0.3333 |
| Frank | Sales | 68000.00 | 0.6667 |
| Eve | Sales | 72000.00 | 1 |
Each department is evaluated independently. Within Engineering, Diana is at the bottom (0.0) and Alice is at the top (1.0). The formula uses each partition's own row count (4 in each department): (rank - 1) / (4 - 1).
How PERCENT_RANK Handles Ties
When rows share the same value, they receive the same RANK(), which means they also receive the same PERCENT_RANK(). Let's add a tie to see:
-- Simulating a tie with a modified dataset
SELECT
name,
salary,
RANK() OVER (ORDER BY salary) AS rnk,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM (
VALUES ('Hank', 60000), ('Leo', 62000), ('Grace', 65000),
('Karen', 65000), ('Frank', 68000)
) AS t(name, salary);
Output:
| name | salary | rnk | pct_rank |
|---|---|---|---|
| Hank | 60000 | 1 | 0.0000 |
| Leo | 62000 | 2 | 0.2500 |
| Karen | 65000 | 3 | 0.5000 |
| Grace | 65000 | 3 | 0.5000 |
| Frank | 68000 | 5 | 1.0000 |
Grace and Karen are tied at rank 3, so they both get the same PERCENT_RANK of 0.5. Rank 4 is skipped (standard RANK() behavior), and Frank jumps to rank 5 with a PERCENT_RANK of 1.0.
CUME_DIST(): Cumulative Distribution
CUME_DIST() (cumulative distribution) calculates the proportion of rows that have a value less than or equal to the current row's value. It also returns a value between 0 and 1, but its formula and behavior differ from PERCENT_RANK().
Syntax
CUME_DIST() OVER (
[PARTITION BY column1, ...]
ORDER BY column2 [ASC|DESC]
)
Like PERCENT_RANK(), CUME_DIST() takes no arguments.
Formula
CUME_DIST = (number of rows with value <= current row's value) / (total rows)
Key differences from PERCENT_RANK():
CUME_DIST()is always greater than 0 (never exactly 0)- The last row (or rows tied for last) always gets 1.0
CUME_DIST()answers: "What fraction of all rows are at or below this value?"
Basic Example
SELECT
name,
salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM employees;
Output:
| name | salary | pct_rank | cume_dist |
|---|---|---|---|
| Hank | 60000 | 0.0000 | 0.0833 |
| Leo | 62000 | 0.0909 | 0.1667 |
| Grace | 65000 | 0.1818 | 0.2500 |
| Karen | 66000 | 0.2727 | 0.3333 |
| Frank | 68000 | 0.3636 | 0.4167 |
| Jack | 70000 | 0.4545 | 0.5000 |
| Eve | 72000 | 0.5455 | 0.5833 |
| Ivy | 74000 | 0.6364 | 0.6667 |
| Diana | 78000 | 0.7273 | 0.7500 |
| Charlie | 82000 | 0.8182 | 0.8333 |
| Bob | 88000 | 0.9091 | 0.9167 |
| Alice | 95000 | 1.0000 | 1.0000 |
Let's verify for Hank: 1 row has a salary <= 60000 (just Hank himself), and there are 12 total rows. CUME_DIST = 1 / 12 = 0.0833.
For Grace: 3 rows have salaries <= 65000 (Hank, Leo, Grace). CUME_DIST = 3 / 12 = 0.2500.
PERCENT_RANK vs CUME_DIST: When to Use Which
| Aspect | PERCENT_RANK | CUME_DIST |
|---|---|---|
| Formula | (rank - 1) / (total - 1) | rows <= current / total |
| Range | 0 to 1 | Greater than 0, up to 1 |
| First row value | Always 0 | Always > 0 |
| Last row value | Always 1 | Always 1 |
| Interpretation | % of rows below this one | % of rows at or below this one |
| Analogy | "I scored better than X% of people" | "X% of people scored at or below me" |
- Use
PERCENT_RANK()when you want to know a row's relative position within the group (like a percentile rank on a standardized test: "You scored in the 85th percentile"). - Use
CUME_DIST()when you want to know what fraction of the data falls at or below a certain value (like "75% of employees earn this amount or less").
CUME_DIST with Ties
When rows are tied, CUME_DIST() includes all tied rows in the numerator:
SELECT
name,
salary,
ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM (
VALUES ('Hank', 60000), ('Leo', 62000), ('Grace', 65000),
('Karen', 65000), ('Frank', 68000)
) AS t(name, salary);
Output:
| name | salary | cume_dist |
|---|---|---|
| Hank | 60000 | 0.2000 |
| Leo | 62000 | 0.4000 |
| Grace | 65000 | 0.8000 |
| Karen | 65000 | 0.8000 |
| Frank | 68000 | 1.0000 |
Grace and Karen both show 0.8000 because 4 out of 5 rows have salaries <= 65000. All tied rows receive the same CUME_DIST() value.
Practical Use Cases
Use Case 1: Customer Segmentation into Value Tiers
Imagine an orders table where you want to segment customers into tiers based on their total spending:
WITH customer_totals AS (
SELECT
customer_id,
SUM(order_amount) AS total_spent
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
total_spent,
NTILE(5) OVER (ORDER BY total_spent DESC) AS value_tier,
CASE NTILE(5) OVER (ORDER BY total_spent DESC)
WHEN 1 THEN 'Platinum'
WHEN 2 THEN 'Gold'
WHEN 3 THEN 'Silver'
WHEN 4 THEN 'Bronze'
WHEN 5 THEN 'Basic'
END AS tier_label
FROM customer_totals;
This creates five customer tiers based on total spending. Platinum customers (top 20%) might receive premium perks, while Basic customers (bottom 20%) might receive promotional offers to increase engagement.
Use Case 2: Identifying Outliers with PERCENT_RANK
Employees in the top 10% or bottom 10% of salary might warrant special attention:
WITH ranked AS (
SELECT
name,
department,
salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank
FROM employees
)
SELECT
name,
department,
salary,
pct_rank,
CASE
WHEN pct_rank >= 0.90 THEN 'Top 10%'
WHEN pct_rank <= 0.10 THEN 'Bottom 10%'
ELSE 'Middle 80%'
END AS salary_category
FROM ranked;
Output:
| name | department | salary | pct_rank | salary_category |
|---|---|---|---|---|
| Hank | Sales | 60000 | 0.0000 | Bottom 10% |
| Leo | Marketing | 62000 | 0.0909 | Bottom 10% |
| Grace | Sales | 65000 | 0.1818 | Middle 80% |
| Karen | Marketing | 66000 | 0.2727 | Middle 80% |
| Frank | Sales | 68000 | 0.3636 | Middle 80% |
| Jack | Marketing | 70000 | 0.4545 | Middle 80% |
| Eve | Sales | 72000 | 0.5455 | Middle 80% |
| Ivy | Marketing | 74000 | 0.6364 | Middle 80% |
| Diana | Engineering | 78000 | 0.7273 | Middle 80% |
| Charlie | Engineering | 82000 | 0.8182 | Middle 80% |
| Bob | Engineering | 88000 | 0.9091 | Top 10% |
| Alice | Engineering | 95000 | 1.0000 | Top 10% |
Use Case 3: Grade Curve with CUME_DIST
In educational contexts, CUME_DIST() can assign letter grades based on a curve:
WITH scored AS (
SELECT
student_name,
test_score,
ROUND(CUME_DIST() OVER (ORDER BY test_score), 4) AS cume_dist
FROM test_results
)
SELECT
student_name,
test_score,
cume_dist,
CASE
WHEN cume_dist <= 0.10 THEN 'F'
WHEN cume_dist <= 0.30 THEN 'D'
WHEN cume_dist <= 0.60 THEN 'C'
WHEN cume_dist <= 0.85 THEN 'B'
ELSE 'A'
END AS curved_grade
FROM scored;
The bottom 10% receive an F, the next 20% a D, the next 30% a C, the next 25% a B, and the top 15% an A.
Use Case 4: Median Approximation with NTILE
While SQL does not have a universal MEDIAN() function (some databases offer one as an extension), you can approximate it with NTILE(2):
WITH halved AS (
SELECT
salary,
NTILE(2) OVER (ORDER BY salary) AS half
FROM employees
)
SELECT
MAX(CASE WHEN half = 1 THEN salary END) AS lower_median_boundary,
MIN(CASE WHEN half = 2 THEN salary END) AS upper_median_boundary,
(MAX(CASE WHEN half = 1 THEN salary END) +
MIN(CASE WHEN half = 2 THEN salary END)) / 2.0 AS approx_median
FROM halved;
Output:
| lower_median_boundary | upper_median_boundary | approx_median |
|---|---|---|
| 70000 | 72000 | 71000.00 |
This finds the boundary between the lower and upper halves and averages them to approximate the median salary.
All Three Functions Side by Side
This comprehensive query lets you compare all three percentile-related functions at once:
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 4) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary), 4) AS cume_dist
FROM employees;
Output:
| name | salary | quartile | pct_rank | cume_dist |
|---|---|---|---|---|
| Hank | 60000 | 1 | 0.0000 | 0.0833 |
| Leo | 62000 | 1 | 0.0909 | 0.1667 |
| Grace | 65000 | 1 | 0.1818 | 0.2500 |
| Karen | 66000 | 2 | 0.2727 | 0.3333 |
| Frank | 68000 | 2 | 0.3636 | 0.4167 |
| Jack | 70000 | 2 | 0.4545 | 0.5000 |
| Eve | 72000 | 3 | 0.5455 | 0.5833 |
| Ivy | 74000 | 3 | 0.6364 | 0.6667 |
| Diana | 78000 | 3 | 0.7273 | 0.7500 |
| Charlie | 82000 | 4 | 0.8182 | 0.8333 |
| Bob | 88000 | 4 | 0.9091 | 0.9167 |
| Alice | 95000 | 4 | 1.0000 | 1.0000 |
Notice the relationships:
- NTILE(4) creates hard group boundaries (1, 2, 3, 4).
- PERCENT_RANK starts at 0 and ends at 1, evenly spaced for unique values.
- CUME_DIST starts above 0 (0.0833) and ends at 1, always slightly higher than
PERCENT_RANKfor the same row.
Common Mistakes to Avoid
Mistake 1: Assuming NTILE Creates Value-Based Buckets
-- NTILE does NOT look at actual values, only row positions
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;
If one quartile spans salaries from 60000 to 65000 and another spans from 82000 to 95000, NTILE() does not care. It distributes rows by count, not by value range. If you need value-based ranges, use a CASE expression:
-- Value-based bucketing
SELECT
name,
salary,
CASE
WHEN salary < 65000 THEN 1
WHEN salary < 75000 THEN 2
WHEN salary < 85000 THEN 3
ELSE 4
END AS salary_bracket
FROM employees;
Mistake 2: Confusing PERCENT_RANK and CUME_DIST
These two functions look similar but answer different questions:
-- "What percentage scored BELOW me?"
PERCENT_RANK() OVER (ORDER BY salary) -- 0.0000 for the lowest
-- "What percentage scored AT OR BELOW me?"
CUME_DIST() OVER (ORDER BY salary) -- 0.0833 for the lowest
If you use CUME_DIST() when you meant PERCENT_RANK(), your thresholds for filtering (like "top 10%") will be slightly off. With large datasets the difference is negligible, but with small datasets it can matter.
Mistake 3: Using NTILE for Exact Percentile Values
NTILE(100) does not give you the same result as PERCENT_RANK() multiplied by 100:
SELECT
name,
salary,
NTILE(100) OVER (ORDER BY salary) AS ntile_100,
ROUND(PERCENT_RANK() OVER (ORDER BY salary) * 100, 2) AS pct_rank_100
FROM employees;
Output:
| name | salary | ntile_100 | pct_rank_100 |
|---|---|---|---|
| Hank | 60000 | 1 | 0.00 |
| Leo | 62000 | 2 | 9.09 |
| Grace | 65000 | 3 | 18.18 |
| Karen | 66000 | 4 | 27.27 |
| Frank | 68000 | 5 | 36.36 |
| Jack | 70000 | 6 | 45.45 |
| Eve | 72000 | 7 | 54.55 |
| Ivy | 74000 | 8 | 63.64 |
| Diana | 78000 | 9 | 72.73 |
| Charlie | 82000 | 10 | 81.82 |
| Bob | 88000 | 11 | 90.91 |
| Alice | 95000 | 12 | 100.00 |
With only 12 rows and 100 buckets, NTILE(100) can only use values 1 through 12. It cannot produce values like 50 or 75. NTILE(100) only approximates true percentiles when you have hundreds or thousands of rows.
Mistake 4: Filtering Directly on Window Functions in WHERE
Like all window functions, NTILE(), PERCENT_RANK(), and CUME_DIST() cannot appear in WHERE:
-- WRONG: Will fail
SELECT name, salary
FROM employees
WHERE NTILE(4) OVER (ORDER BY salary) = 4;
Error: window functions are not allowed in WHERE
Fix: Use a CTE or subquery:
WITH quartiled AS (
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees
)
SELECT name, salary
FROM quartiled
WHERE quartile = 4;
Output:
| name | salary |
|---|---|
| Charlie | 82000 |
| Bob | 88000 |
| Alice | 95000 |
Performance Considerations
-
All three functions require sorting. The database must sort the data by the
ORDER BYcolumn within each partition. Indexing theORDER BYcolumn (andPARTITION BYcolumns in a composite index) improves performance on large tables. -
NTILE is lightweight. Once the data is sorted, assigning bucket numbers is a simple division operation. There is minimal overhead beyond the sort itself.
-
PERCENT_RANK and CUME_DIST rely on RANK internally. They compute the row's rank first, then apply their formula. The overhead is comparable to
RANK(). -
Combine window definitions when possible. If you use multiple percentile functions with the same
OVER()clause, the database can often share a single sort pass:
SELECT
name,
salary,
NTILE(4) OVER w AS quartile,
PERCENT_RANK() OVER w AS pct_rank,
CUME_DIST() OVER w AS cume_dist
FROM employees
WINDOW w AS (ORDER BY salary);
Summary
SQL NTILE, PERCENT_RANK(), and CUME_DIST() are three window functions purpose-built for distributional analysis and percentile calculations:
-
NTILE(N)divides ordered rows into N roughly equal buckets, assigning each row a bucket number from 1 to N. Best for quartile analysis, customer segmentation, and tiered classification. Remember that it distributes by row count, not by value range. -
PERCENT_RANK()calculates a row's relative position as a value from 0 to 1, representing the percentage of rows that rank below it. Best for percentile rankings and identifying rows in the top or bottom X%. -
CUME_DIST()calculates the cumulative distribution, representing the fraction of rows with values at or below the current row's value. Best for distribution analysis and curved grading.
All three require ORDER BY inside OVER(), support PARTITION BY for group-level analysis, and cannot be used in WHERE directly. For filtering, always wrap them in a CTE or subquery.
Ask yourself what you need:
- "Put these rows into N equal groups" →
NTILE(N) - "What percentile is this row at?" →
PERCENT_RANK() - "What fraction of the data is at or below this value?" →
CUME_DIST()