Python Pandas: How to Get Descriptive Statistics for a Pandas DataFrame in Python
Descriptive statistics provide a quick summary of your data's central tendency, dispersion, and distribution. Pandas makes it easy to compute these statistics with a single method call using .describe().
In this guide, we'll cover how to use describe() effectively, explain its parameters, and show additional methods for computing individual statistics.
Quick Answer: Use .describe()
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
print(df.describe())
Output:
price rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000
By default, describe() only summarizes numeric columns. Let's explore how to include other column types and customize the output.
Understanding describe() Output
For numeric columns, describe() returns:
| Statistic | Description |
|---|---|
count | Number of non-null values |
mean | Average value |
std | Standard deviation |
min | Minimum value |
25% | 25th percentile (first quartile) |
50% | 50th percentile (median) |
75% | 75th percentile (third quartile) |
max | Maximum value |
For categorical/string columns, it returns:
| Statistic | Description |
|---|---|
count | Number of non-null values |
unique | Number of distinct values |
top | Most frequent value |
freq | Frequency of the most common value |
Describing Specific Columns
A Single Column
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
print(df['price'].describe())
Output:
count 5.000000
mean 26800.000000
std 10756.393448
min 19000.000000
25% 20000.000000
50% 22000.000000
75% 28000.000000
max 45000.000000
Name: price, dtype: float64
Multiple Selected Columns
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
print(df[['price', 'rating']].describe())
Output:
price rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000
Including All Column Types
By default, describe() only includes numeric columns. Use the include parameter to change this behavior.
Include Everything
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
print(df.describe(include='all'))
Output:
product price rating
count 5 5.000000 5.000000
unique 4 NaN NaN
top Mobile NaN NaN
freq 2 NaN NaN
mean NaN 26800.000000 4.240000
std NaN 10756.393448 0.364692
min NaN 19000.000000 3.800000
25% NaN 20000.000000 4.000000
50% NaN 22000.000000 4.200000
75% NaN 28000.000000 4.500000
max NaN 45000.000000 4.700000
NaN values appear where a statistic doesn't apply to that column type (e.g., mean for strings).
Include Only Specific Types
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
# Only numeric columns
print(df.describe(include='number'))
# Only string/object columns
print(df.describe(include='object'))
# Only specific types
print(df.describe(include=[int, float]))
Output:
price rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000
product
count 5
unique 4
top Mobile
freq 2
price rating
count 5.000000 5.000000
mean 26800.000000 4.240000
std 10756.393448 0.364692
min 19000.000000 3.800000
25% 20000.000000 4.000000
50% 22000.000000 4.200000
75% 28000.000000 4.500000
max 45000.000000 4.700000
Exclude Specific Types
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
# Exclude numeric columns
print(df.describe(exclude='number'))
Output:
product
count 5
unique 4
top Mobile
freq 2
Custom Percentiles
By default, describe() shows the 25th, 50th, and 75th percentiles. You can customize this:
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
# Show 10th, 50th, and 90th percentiles
print(df['price'].describe(percentiles=[0.1, 0.5, 0.9]))
Output:
count 5.000000
mean 26800.000000
std 10756.393448
min 19000.000000
10% 19400.000000
50% 22000.000000
90% 38200.000000
max 45000.000000
Name: price, dtype: float64
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
# Show every 20th percentile
print(df['price'].describe(percentiles=[0.2, 0.4, 0.6, 0.8]))
Output:
count 5.000000
mean 26800.000000
std 10756.393448
min 19000.000000
20% 19800.000000
40% 21200.000000
50% 22000.000000
60% 24400.000000
80% 31400.000000
max 45000.000000
Name: price, dtype: float64
The 50th percentile is always included, even if you don't specify it in the percentiles parameter.
Getting Individual Statistics
If you need specific statistics rather than the full summary, Pandas provides dedicated methods for each:
import pandas as pd
df = pd.DataFrame({
'product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7]
})
print(f"Count: {df['price'].count()}")
print(f"Mean: {df['price'].mean()}")
print(f"Median: {df['price'].median()}")
print(f"Std Dev: {df['price'].std():.2f}")
print(f"Variance: {df['price'].var():.2f}")
print(f"Min: {df['price'].min()}")
print(f"Max: {df['price'].max()}")
print(f"Sum: {df['price'].sum()}")
print(f"Mode: {df['product'].mode()[0]}")
Output:
Count: 5
Mean: 26800.0
Median: 22000.0
Std Dev: 10756.39
Variance: 115700000.00
Min: 19000
Max: 45000
Sum: 134000
Mode: Mobile
All Methods for Individual Statistics
| Statistic | Method | Works On |
|---|---|---|
| Count (non-null) | .count() | All types |
| Mean | .mean() | Numeric |
| Median | .median() | Numeric |
| Mode | .mode() | All types |
| Standard deviation | .std() | Numeric |
| Variance | .var() | Numeric |
| Minimum | .min() | Numeric, datetime |
| Maximum | .max() | Numeric, datetime |
| Sum | .sum() | Numeric |
| Skewness | .skew() | Numeric |
| Kurtosis | .kurt() | Numeric |
| Unique count | .nunique() | All types |
| Quantile | .quantile(0.75) | Numeric |
Describing Grouped Data
Combine describe() with groupby() to get statistics per group:
import pandas as pd
df = pd.DataFrame({
'category': ['Electronics', 'Electronics', 'Furniture', 'Furniture', 'Electronics'],
'product': ['Mobile', 'Laptop', 'Sofa', 'Table', 'AC'],
'price': [20000, 45000, 19000, 12000, 28000]
})
print(df.groupby('category')['price'].describe())
Output:
count mean std ... 50% 75% max
category ...
Electronics 3.0 31000.0 12767.145335 ... 28000.0 36500.0 45000.0
Furniture 2.0 15500.0 4949.747468 ... 15500.0 17250.0 19000.0
[2 rows x 8 columns]
Transposing for Better Readability
When you have many columns, transposing the output makes it easier to read:
import pandas as pd
df = pd.DataFrame({
'price': [20000, 28000, 22000, 19000, 45000],
'rating': [4.2, 3.8, 4.5, 4.0, 4.7],
'weight': [0.2, 35.0, 0.2, 40.0, 2.5],
'warranty_years': [1, 2, 1, 1, 3]
})
print(df.describe().T) # Transpose: columns become rows
Output:
count mean std ... 50% 75% max
price 5.0 26800.00 10756.393448 ... 22000.0 28000.0 45000.0
rating 5.0 4.24 0.364692 ... 4.2 4.5 4.7
weight 5.0 15.58 20.109998 ... 2.5 35.0 40.0
warranty_years 5.0 1.60 0.894427 ... 1.0 2.0 3.0
[4 rows x 8 columns]
Saving Descriptive Statistics
To a CSV File
stats = df.describe()
stats.to_csv("descriptive_stats.csv")
To a Dictionary
stats_dict = df.describe().to_dict()
print(stats_dict['price'])
Quick Reference
| Task | Code |
|---|---|
| Describe all numeric columns | df.describe() |
| Describe a single column | df['col'].describe() |
| Include all column types | df.describe(include='all') |
| Only categorical columns | df.describe(include='object') |
| Custom percentiles | df.describe(percentiles=[0.1, 0.9]) |
| Grouped statistics | df.groupby('col1')['col2'].describe() |
| Transpose for readability | df.describe().T |
| Individual stat (mean) | df['col'].mean() |
| Individual stat (median) | df['col'].median() |
Conclusion
The describe() method is the fastest way to get a comprehensive statistical summary of your DataFrame.
By default, it summarizes numeric columns with count, mean, standard deviation, min, max, and quartile values.
Use include='all' to include categorical columns, customize percentiles with the percentiles parameter, and combine with groupby() for per-group statistics.
For individual statistics, Pandas offers dedicated methods like .mean(), .std(), .median(), and .quantile() that provide more targeted results.