Skip to main content

Python Pandas: How to Get Descriptive Statistics for a Pandas DataFrame in Python

Descriptive statistics provide a quick summary of your data's central tendency, dispersion, and distribution. Pandas makes it easy to compute these statistics with a single method call using .describe().

In this guide, we'll cover how to use describe() effectively, explain its parameters, and show additional methods for computing individual statistics.

Quick Answer: Use .describe()

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df.describe())

Output:

              price    rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000

By default, describe() only summarizes numeric columns. Let's explore how to include other column types and customize the output.

Understanding describe() Output

For numeric columns, describe() returns:

StatisticDescription
countNumber of non-null values
meanAverage value
stdStandard deviation
minMinimum value
25%25th percentile (first quartile)
50%50th percentile (median)
75%75th percentile (third quartile)
maxMaximum value

For categorical/string columns, it returns:

StatisticDescription
countNumber of non-null values
uniqueNumber of distinct values
topMost frequent value
freqFrequency of the most common value

Describing Specific Columns

A Single Column

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df['price'].describe())

Output:

count        5.000000
mean 26800.000000
std 10756.393448
min 19000.000000
25% 20000.000000
50% 22000.000000
75% 28000.000000
max 45000.000000
Name: price, dtype: float64

Multiple Selected Columns

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df[['price', 'rating']].describe())

Output:

              price    rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000

Including All Column Types

By default, describe() only includes numeric columns. Use the include parameter to change this behavior.

Include Everything

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df.describe(include='all'))

Output:

       product         price    rating
count 5 5.000000 5.000000
unique 4 NaN NaN
top Mobile NaN NaN
freq 2 NaN NaN
mean NaN 26800.000000 4.240000
std NaN 10756.393448 0.364692
min NaN 19000.000000 3.800000
25% NaN 20000.000000 4.000000
50% NaN 22000.000000 4.200000
75% NaN 28000.000000 4.500000
max NaN 45000.000000 4.700000
note

NaN values appear where a statistic doesn't apply to that column type (e.g., mean for strings).

Include Only Specific Types

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

# Only numeric columns
print(df.describe(include='number'))

# Only string/object columns
print(df.describe(include='object'))

# Only specific types
print(df.describe(include=[int, float]))

Output:

              price    rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000
product
count 5
unique 4
top Mobile
freq 2
price rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000

Exclude Specific Types

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

# Exclude numeric columns
print(df.describe(exclude='number'))

Output:

       product
count 5
unique 4
top Mobile
freq 2

Custom Percentiles

By default, describe() shows the 25th, 50th, and 75th percentiles. You can customize this:

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

# Show 10th, 50th, and 90th percentiles
print(df['price'].describe(percentiles=[0.1, 0.5, 0.9]))

Output:

count        5.000000
mean 26800.000000
std 10756.393448
min 19000.000000
10% 19400.000000
50% 22000.000000
90% 38200.000000
max 45000.000000
Name: price, dtype: float64
import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
# Show every 20th percentile
print(df['price'].describe(percentiles=[0.2, 0.4, 0.6, 0.8]))

Output:

count        5.000000
mean 26800.000000
std 10756.393448
min 19000.000000
20% 19800.000000
40% 21200.000000
50% 22000.000000
60% 24400.000000
80% 31400.000000
max 45000.000000
Name: price, dtype: float64
info

The 50th percentile is always included, even if you don't specify it in the percentiles parameter.

Getting Individual Statistics

If you need specific statistics rather than the full summary, Pandas provides dedicated methods for each:

import pandas as pd

df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(f"Count: {df['price'].count()}")
print(f"Mean: {df['price'].mean()}")
print(f"Median: {df['price'].median()}")
print(f"Std Dev: {df['price'].std():.2f}")
print(f"Variance: {df['price'].var():.2f}")
print(f"Min: {df['price'].min()}")
print(f"Max: {df['price'].max()}")
print(f"Sum: {df['price'].sum()}")
print(f"Mode: {df['product'].mode()[0]}")

Output:

Count:    5
Mean: 26800.0
Median: 22000.0
Std Dev: 10756.39
Variance: 115700000.00
Min: 19000
Max: 45000
Sum: 134000
Mode: Mobile

All Methods for Individual Statistics

StatisticMethodWorks On
Count (non-null).count()All types
Mean.mean()Numeric
Median.median()Numeric
Mode.mode()All types
Standard deviation.std()Numeric
Variance.var()Numeric
Minimum.min()Numeric, datetime
Maximum.max()Numeric, datetime
Sum.sum()Numeric
Skewness.skew()Numeric
Kurtosis.kurt()Numeric
Unique count.nunique()All types
Quantile.quantile(0.75)Numeric

Describing Grouped Data

Combine describe() with groupby() to get statistics per group:

import pandas as pd

df = pd.DataFrame({
'category': ['Electronics', 'Electronics', 'Furniture', 'Furniture', 'Electronics'],
'product': ['Mobile', 'Laptop', 'Sofa', 'Table', 'AC'],
'price': [20000, 45000, 19000, 12000, 28000]
})

print(df.groupby('category')['price'].describe())

Output:

             count     mean           std  ...      50%      75%      max
category ...
Electronics 3.0 31000.0 12767.145335 ... 28000.0 36500.0 45000.0
Furniture 2.0 15500.0 4949.747468 ... 15500.0 17250.0 19000.0

[2 rows x 8 columns]

Transposing for Better Readability

When you have many columns, transposing the output makes it easier to read:

import pandas as pd

df = pd.DataFrame({
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7],
'weight': [0.2, 35.0, 0.2, 40.0, 2.5],
'warranty_years': [1, 2, 1, 1, 3]
})

print(df.describe().T) # Transpose: columns become rows

Output:

                count      mean           std  ...      50%      75%      max
price 5.0 26800.00 10756.393448 ... 22000.0 28000.0 45000.0
rating 5.0 4.24 0.364692 ... 4.2 4.5 4.7
weight 5.0 15.58 20.109998 ... 2.5 35.0 40.0
warranty_years 5.0 1.60 0.894427 ... 1.0 2.0 3.0

[4 rows x 8 columns]

Saving Descriptive Statistics

To a CSV File

stats = df.describe()
stats.to_csv("descriptive_stats.csv")

To a Dictionary

stats_dict = df.describe().to_dict()
print(stats_dict['price'])

Quick Reference

TaskCode
Describe all numeric columnsdf.describe()
Describe a single columndf['col'].describe()
Include all column typesdf.describe(include='all')
Only categorical columnsdf.describe(include='object')
Custom percentilesdf.describe(percentiles=[0.1, 0.9])
Grouped statisticsdf.groupby('col1')['col2'].describe()
Transpose for readabilitydf.describe().T
Individual stat (mean)df['col'].mean()
Individual stat (median)df['col'].median()

Conclusion

The describe() method is the fastest way to get a comprehensive statistical summary of your DataFrame.

By default, it summarizes numeric columns with count, mean, standard deviation, min, max, and quartile values.

Use include='all' to include categorical columns, customize percentiles with the percentiles parameter, and combine with groupby() for per-group statistics.

For individual statistics, Pandas offers dedicated methods like .mean(), .std(), .median(), and .quantile() that provide more targeted results.