Python Pandas: How to Get Descriptive Statistics for a Pandas DataFrame in Python

Descriptive statistics provide a quick summary of your data's central tendency, dispersion, and distribution. Pandas makes it easy to compute these statistics with a single method call using .describe().

In this guide, we'll cover how to use describe() effectively, explain its parameters, and show additional methods for computing individual statistics.

Quick Answer: Use `.describe()`

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df.describe())

Output:

              price    rating
count      5.000000  5.000000
mean   26800.000000  4.240000
std    10756.393448  0.364692
min    19000.000000  3.800000
25%    20000.000000  4.000000
50%    22000.000000  4.200000
75%    28000.000000  4.500000
max    45000.000000  4.700000

By default, describe() only summarizes numeric columns. Let's explore how to include other column types and customize the output.

Understanding `describe()` Output

For numeric columns, describe() returns:

Statistic	Description
`count`	Number of non-null values
`mean`	Average value
`std`	Standard deviation
`min`	Minimum value
`25%`	25th percentile (first quartile)
`50%`	50th percentile (median)
`75%`	75th percentile (third quartile)
`max`	Maximum value

For categorical/string columns, it returns:

Statistic	Description
`count`	Number of non-null values
`unique`	Number of distinct values
`top`	Most frequent value
`freq`	Frequency of the most common value

Describing Specific Columns

A Single Column

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df['price'].describe())

Output:

count        5.000000
mean     26800.000000
std      10756.393448
min      19000.000000
25%      20000.000000
50%      22000.000000
75%      28000.000000
max      45000.000000
Name: price, dtype: float64

Multiple Selected Columns

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df[['price', 'rating']].describe())

Output:

              price    rating
count      5.000000  5.000000
mean   26800.000000  4.240000
std    10756.393448  0.364692
min    19000.000000  3.800000
25%    20000.000000  4.000000
50%    22000.000000  4.200000
75%    28000.000000  4.500000
max    45000.000000  4.700000

Including All Column Types

By default, describe() only includes numeric columns. Use the include parameter to change this behavior.

Include Everything

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(df.describe(include='all'))

Output:

       product         price    rating
count        5      5.000000  5.000000
unique       4           NaN       NaN
top     Mobile           NaN       NaN
freq         2           NaN       NaN
mean       NaN  26800.000000  4.240000
std        NaN  10756.393448  0.364692
min        NaN  19000.000000  3.800000
25%        NaN  20000.000000  4.000000
50%        NaN  22000.000000  4.200000
75%        NaN  28000.000000  4.500000
max        NaN  45000.000000  4.700000

note

NaN values appear where a statistic doesn't apply to that column type (e.g., mean for strings).

Include Only Specific Types

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

# Only numeric columns
print(df.describe(include='number'))

# Only string/object columns
print(df.describe(include='object'))

# Only specific types
print(df.describe(include=[int, float]))

Output:

              price    rating
count      5.000000  5.000000
mean   26800.000000  4.240000
std    10756.393448  0.364692
min    19000.000000  3.800000
25%    20000.000000  4.000000
50%    22000.000000  4.200000
75%    28000.000000  4.500000
max    45000.000000  4.700000
       product
count        5
unique       4
top     Mobile
freq         2
              price    rating
count      5.000000  5.000000
mean   26800.000000  4.240000
std    10756.393448  0.364692
min    19000.000000  3.800000
25%    20000.000000  4.000000
50%    22000.000000  4.200000
75%    28000.000000  4.500000
max    45000.000000  4.700000

Exclude Specific Types

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

# Exclude numeric columns
print(df.describe(exclude='number'))

Output:

       product
count        5
unique       4
top     Mobile
freq         2

Custom Percentiles

By default, describe() shows the 25th, 50th, and 75th percentiles. You can customize this:

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

# Show 10th, 50th, and 90th percentiles
print(df['price'].describe(percentiles=[0.1, 0.5, 0.9]))

Output:

count        5.000000
mean     26800.000000
std      10756.393448
min      19000.000000
10%      19400.000000
50%      22000.000000
90%      38200.000000
max      45000.000000
Name: price, dtype: float64

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
# Show every 20th percentile
print(df['price'].describe(percentiles=[0.2, 0.4, 0.6, 0.8]))

Output:

count        5.000000
mean     26800.000000
std      10756.393448
min      19000.000000
20%      19800.000000
40%      21200.000000
50%      22000.000000
60%      24400.000000
80%      31400.000000
max      45000.000000
Name: price, dtype: float64

info

The 50th percentile is always included, even if you don't specify it in the percentiles parameter.

Getting Individual Statistics

If you need specific statistics rather than the full summary, Pandas provides dedicated methods for each:

import pandas as pd

df = pd.DataFrame({
    'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})

print(f"Count:    {df['price'].count()}")
print(f"Mean:     {df['price'].mean()}")
print(f"Median:   {df['price'].median()}")
print(f"Std Dev:  {df['price'].std():.2f}")
print(f"Variance: {df['price'].var():.2f}")
print(f"Min:      {df['price'].min()}")
print(f"Max:      {df['price'].max()}")
print(f"Sum:      {df['price'].sum()}")
print(f"Mode:     {df['product'].mode()[0]}")

Output:

Count:    5
Mean:     26800.0
Median:   22000.0
Std Dev:  10756.39
Variance: 115700000.00
Min:      19000
Max:      45000
Sum:      134000
Mode:     Mobile

All Methods for Individual Statistics

Statistic	Method	Works On
Count (non-null)	`.count()`	All types
Mean	`.mean()`	Numeric
Median	`.median()`	Numeric
Mode	`.mode()`	All types
Standard deviation	`.std()`	Numeric
Variance	`.var()`	Numeric
Minimum	`.min()`	Numeric, datetime
Maximum	`.max()`	Numeric, datetime
Sum	`.sum()`	Numeric
Skewness	`.skew()`	Numeric
Kurtosis	`.kurt()`	Numeric
Unique count	`.nunique()`	All types
Quantile	`.quantile(0.75)`	Numeric

Describing Grouped Data

Combine describe() with groupby() to get statistics per group:

import pandas as pd

df = pd.DataFrame({
    'category': ['Electronics', 'Electronics', 'Furniture', 'Furniture', 'Electronics'],
    'product': ['Mobile', 'Laptop', 'Sofa', 'Table', 'AC'],
    'price': [20000, 45000, 19000, 12000, 28000]
})

print(df.groupby('category')['price'].describe())

Output:

             count     mean           std  ...      50%      75%      max
category                                   ...                           
Electronics    3.0  31000.0  12767.145335  ...  28000.0  36500.0  45000.0
Furniture      2.0  15500.0   4949.747468  ...  15500.0  17250.0  19000.0

[2 rows x 8 columns]

Transposing for Better Readability

When you have many columns, transposing the output makes it easier to read:

import pandas as pd

df = pd.DataFrame({
    'price': [20000, 28000, 22000, 19000, 45000],
    'rating': [4.2, 3.8, 4.5, 4.0, 4.7],
    'weight': [0.2, 35.0, 0.2, 40.0, 2.5],
    'warranty_years': [1, 2, 1, 1, 3]
})

print(df.describe().T)  # Transpose: columns become rows

Output:

                count      mean           std  ...      50%      75%      max
price             5.0  26800.00  10756.393448  ...  22000.0  28000.0  45000.0
rating            5.0      4.24      0.364692  ...      4.2      4.5      4.7
weight            5.0     15.58     20.109998  ...      2.5     35.0     40.0
warranty_years    5.0      1.60      0.894427  ...      1.0      2.0      3.0

[4 rows x 8 columns]

Saving Descriptive Statistics

To a CSV File

stats = df.describe()
stats.to_csv("descriptive_stats.csv")

To a Dictionary

stats_dict = df.describe().to_dict()
print(stats_dict['price'])

Quick Reference

Task	Code
Describe all numeric columns	`df.describe()`
Describe a single column	`df['col'].describe()`
Include all column types	`df.describe(include='all')`
Only categorical columns	`df.describe(include='object')`
Custom percentiles	`df.describe(percentiles=[0.1, 0.9])`
Grouped statistics	`df.groupby('col1')['col2'].describe()`
Transpose for readability	`df.describe().T`
Individual stat (mean)	`df['col'].mean()`
Individual stat (median)	`df['col'].median()`

Conclusion

The describe() method is the fastest way to get a comprehensive statistical summary of your DataFrame.

By default, it summarizes numeric columns with count, mean, standard deviation, min, max, and quartile values.

Use include='all' to include categorical columns, customize percentiles with the percentiles parameter, and combine with groupby() for per-group statistics.

For individual statistics, Pandas offers dedicated methods like .mean(), .std(), .median(), and .quantile() that provide more targeted results.

Quick Answer: Use .describe()​

Understanding describe() Output​

Describing Specific Columns​

A Single Column​

Multiple Selected Columns​

Including All Column Types​

Include Everything​

Include Only Specific Types​

Exclude Specific Types​

Custom Percentiles​

Getting Individual Statistics​

All Methods for Individual Statistics​

Describing Grouped Data​

Transposing for Better Readability​

Saving Descriptive Statistics​

To a CSV File​

To a Dictionary​

Quick Reference​

Conclusion​

Table of Contents

Quick Answer: Use `.describe()`

Understanding `describe()` Output

Describing Specific Columns

A Single Column

Multiple Selected Columns

Including All Column Types

Include Everything

Include Only Specific Types

Exclude Specific Types

Custom Percentiles

Getting Individual Statistics

All Methods for Individual Statistics

Describing Grouped Data

Transposing for Better Readability

Saving Descriptive Statistics

To a CSV File

To a Dictionary

Quick Reference

Conclusion