Python Pandas: How to Classify and Grade Data with Pandas

Assigning categories to rows based on numerical ranges or complex conditions is a common data transformation. Pandas and NumPy provide efficient, vectorized methods that outperform row-by-row iteration.

Using pd.cut(): Numeric Binning

Best for converting continuous numbers into discrete categories:

import pandas as pd

df = pd.DataFrame({'score': [55, 92, 78, 40, 85, 100]})

# Define bin edges and labels
bins = [0, 60, 80, 100]
labels = ['Fail', 'Pass', 'Excellent']

df['grade'] = pd.cut(df['score'], bins=bins, labels=labels)
print(df)

Output:

   score      grade
   55       Fail
   92  Excellent
   78       Pass
   40       Fail
   85  Excellent
  100  Excellent

Including Edge Values

import pandas as pd

df = pd.DataFrame({'score': [0, 60, 80, 100]})

# Default: (lower, upper], excludes left, includes right
# Use include_lowest=True to include the leftmost edge
df['grade'] = pd.cut(
    df['score'],
    bins=[0, 60, 80, 100],
    labels=['Fail', 'Pass', 'Excellent'],
    include_lowest=True
)

Output:

   score      grade
    0       Fail
   60       Fail
   80       Pass
  100  Excellent

Using np.select(): Multiple Conditions

Ideal for complex logic involving multiple columns:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'age': [15, 22, 45, 70, 19],
    'is_student': [True, True, False, False, False]
})

# Define conditions (evaluated in order)
conditions = [
    (df['age'] < 18),
    (df['is_student'] == True),
    (df['age'] >= 65)
]

# Corresponding labels
choices = ['Minor', 'Student', 'Senior']

# Assign with default for unmatched rows
df['category'] = np.select(conditions, choices, default='Adult')
print(df)

Output:

   age  is_student category
 15        True    Minor
 22        True  Student
 45       False    Adult
 70       False   Senior
 19       False    Adult

tip

Conditions in np.select() are evaluated in order. The first matching condition wins, so place more specific conditions before general ones.

Using np.where(): Binary Classification

For simple either/or conditions:

import pandas as pd
import numpy as np

df = pd.DataFrame({'score': [55, 92, 78, 40, 85]})

df['passed'] = np.where(df['score'] >= 60, 'Yes', 'No')
print(df)

Output:

   score passed
   55     No
   92    Yes
   78    Yes
   40     No
   85    Yes

Nested np.where() for Multiple Categories

import pandas as pd
import numpy as np

df = pd.DataFrame({'score': [55, 92, 78, 40]})

df['grade'] = np.where(
    df['score'] >= 80, 'Excellent',
    np.where(df['score'] >= 60, 'Pass', 'Fail')
)

Output:

   score      grade
   55       Fail
   92  Excellent
   78       Pass
   40       Fail

Using map(): Dictionary Lookup

For direct value mapping:

import pandas as pd

df = pd.DataFrame({'status_code': [1, 2, 3, 1, 2]})

status_map = {
    1: 'Pending',
    2: 'Approved',
    3: 'Rejected'
}

df['status'] = df['status_code'].map(status_map)
print(df)

Output:

   status_code    status
          1   Pending
          2  Approved
          3  Rejected
          1   Pending
          2  Approved

Using pd.qcut(): Quantile-Based Binning

Create bins with equal number of records:

import pandas as pd

df = pd.DataFrame({'income': [20000, 35000, 50000, 75000, 150000]})

# Split into 3 equal-sized groups
df['income_tier'] = pd.qcut(df['income'], q=3, labels=['Low', 'Medium', 'High'])
print(df)

Output:

   income income_tier
 20000         Low
 35000         Low
 50000      Medium
 75000        High
150000        High

Performance Comparison

import pandas as pd
import numpy as np

df = pd.DataFrame({'score': range(100000)})

# ❌ Slow: apply with lambda
df['grade'] = df['score'].apply(
    lambda x: 'Excellent' if x >= 80 else ('Pass' if x >= 60 else 'Fail')
)

# ✅ Fast: pd.cut (vectorized)
df['grade'] = pd.cut(df['score'], bins=[0, 60, 80, 100], labels=['Fail', 'Pass', 'Excellent'])

# ✅ Fast: np.select (vectorized)
conditions = [df['score'] >= 80, df['score'] >= 60]
df['grade'] = np.select(conditions, ['Excellent', 'Pass'], default='Fail')

Quick Reference

Method	Best For	Performance
`pd.cut()`	Numeric ranges/bins	⚡ Fast
`pd.qcut()`	Equal-frequency bins	⚡ Fast
`np.select()`	Multiple complex conditions	⚡ Fast
`np.where()`	Binary conditions	⚡ Fast
`.map()`	Direct value lookup	⚡ Fast
`.apply()`	Complex custom logic	🐢 Slow

Summary

Use pd.cut() for straightforward numeric binning like grades or age groups.
Use np.select() for multi-condition classification involving multiple columns.
Use np.where() for simple binary categories.

Reserve .apply() only for logic too complex to vectorize-it's typically 10-100x slower than vectorized alternatives.

Using pd.cut(): Numeric Binning​

Including Edge Values​

Using np.select(): Multiple Conditions​

Using np.where(): Binary Classification​

Nested np.where() for Multiple Categories​

Using map(): Dictionary Lookup​

Using pd.qcut(): Quantile-Based Binning​

Performance Comparison​

Quick Reference​

Summary​

Table of Contents

Using pd.cut(): Numeric Binning

Including Edge Values

Using np.select(): Multiple Conditions

Using np.where(): Binary Classification

Nested np.where() for Multiple Categories

Using map(): Dictionary Lookup

Using pd.qcut(): Quantile-Based Binning

Performance Comparison

Quick Reference

Summary