Skip to main content

Python NumPy: How to Generate Chi-Square Distributions in NumPy

The chi-square (χ²) distribution is fundamental in statistical testing, particularly for goodness-of-fit tests, independence tests, and confidence interval estimation. NumPy provides efficient methods to generate random samples from this distribution.

Using the Modern Generator API

The recommended approach uses NumPy's modern random generator:

import numpy as np

rng = np.random.default_rng(seed=42) # Optional seed for reproducibility

# Generate 5 samples with 3 degrees of freedom
samples = rng.chisquare(df=3, size=5)
print(samples)

Output:

[3.05543949 4.35892866 2.6205389  2.29722655 4.79524961]

Understanding Degrees of Freedom

The df parameter (degrees of freedom) controls the distribution's shape:

import numpy as np

rng = np.random.default_rng()

# Low df: heavily right-skewed
low_df = rng.chisquare(df=1, size=5)
print(f"df=1: mean={low_df.mean():.2f}")

# Medium df: moderately skewed
mid_df = rng.chisquare(df=5, size=5)
print(f"df=5: mean={mid_df.mean():.2f}")

# High df: approaches normal distribution
high_df = rng.chisquare(df=30, size=5)
print(f"df=30: mean={high_df.mean():.2f}")

Output:

df=1: mean=0.42
df=5: mean=5.07
df=30: mean=28.63
note

The mean of a chi-square distribution equals its degrees of freedom (df), and the variance equals 2×df. As df increases, the distribution becomes more symmetric.

Multi-dimensional Output

Generate matrices or higher-dimensional arrays of chi-square samples:

import numpy as np

rng = np.random.default_rng()

# 3x3 matrix
matrix = rng.chisquare(df=5, size=(3, 3))
print(matrix.shape) # (3, 3)
print(matrix)

# 3D array: 2 layers, 3 rows, 4 columns
tensor = rng.chisquare(df=10, size=(2, 3, 4))
print(tensor.shape) # (2, 3, 4)

Output:

(3, 3)
[[ 2.68470397 4.50259988 3.39465421]
[ 5.0783715 1.16429817 3.38975043]
[ 5.70560665 1.39477394 14.24344514]]
(2, 3, 4)

Visualizing Different Degrees of Freedom

Compare how the distribution shape changes with different df values:

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=42)

fig, ax = plt.subplots(figsize=(10, 6))

degrees = [1, 2, 5, 10, 20]
colors = ['red', 'blue', 'green', 'orange', 'purple']

for df, color in zip(degrees, colors):
samples = rng.chisquare(df=df, size=10000)
ax.hist(samples, bins=100, density=True, alpha=0.5,
histtype='stepfilled', label=f'df={df}', color=color)

ax.set_xlabel('Value')
ax.set_ylabel('Density')
ax.set_title('Chi-Square Distribution for Various Degrees of Freedom')
ax.legend()
ax.set_xlim(0, 40)
plt.show()

Statistical Properties

Verify theoretical properties with samples:

import numpy as np

rng = np.random.default_rng()

df = 10
samples = rng.chisquare(df=df, size=100000)

print(f"Degrees of freedom: {df}")
print(f"Theoretical mean: {df}")
print(f"Sample mean: {samples.mean():.2f}")
print(f"Theoretical variance: {2 * df}")
print(f"Sample variance: {samples.var():.2f}")

Output:

Degrees of freedom: 10
Theoretical mean: 10
Sample mean: 9.99
Theoretical variance: 20
Sample variance: 19.82

Practical Example: Chi-Square Test Simulation

Simulate the chi-square test statistic under the null hypothesis:

import numpy as np

def simulate_chi_square_test(observed, expected, n_simulations=10000):
"""Simulate chi-square test statistic distribution."""
rng = np.random.default_rng()

# Calculate observed test statistic
observed_stat = np.sum((observed - expected)**2 / expected)

# Degrees of freedom
df = len(observed) - 1

# Simulate null distribution
null_distribution = rng.chisquare(df=df, size=n_simulations)

# Calculate p-value
p_value = np.mean(null_distribution >= observed_stat)

return observed_stat, p_value

# Example: Testing if a die is fair
observed = np.array([18, 22, 15, 20, 25, 20]) # Observed counts
expected = np.array([20, 20, 20, 20, 20, 20]) # Expected if fair

stat, p_val = simulate_chi_square_test(observed, expected)
print(f"Chi-square statistic: {stat:.2f}")
print(f"Simulated p-value: {p_val:.4f}")

Output:

Chi-square statistic: 2.90
Simulated p-value: 0.7184

Legacy API (For Reference)

The older API still works but is not recommended for new code:

import numpy as np

# Legacy method (avoid in new code)
np.random.seed(42)
samples = np.random.chisquare(df=5, size=10)

# Modern equivalent (preferred)
rng = np.random.default_rng(42)
samples = rng.chisquare(df=5, size=10)
tip

The modern default_rng() API provides better statistical properties, is faster, and supports independent random streams for parallel computing.

Combining with SciPy for Full Analysis

For complete statistical analysis, combine NumPy sampling with SciPy:

import numpy as np
from scipy import stats

rng = np.random.default_rng()

df = 5
samples = rng.chisquare(df=df, size=10000)

# Compare with theoretical distribution
x = np.linspace(0, 20, 100)
theoretical_pdf = stats.chi2.pdf(x, df)

# Calculate percentiles
p95 = stats.chi2.ppf(0.95, df)
print(f"95th percentile (theoretical): {p95:.2f}")
print(f"95th percentile (sample): {np.percentile(samples, 95):.2f}")

Output:

95th percentile (theoretical): 11.07
95th percentile (sample): 11.06

Parameters Reference

ParameterDescriptionValid Values
dfDegrees of freedomPositive number (> 0)
sizeOutput shapeint, tuple, or None

Key Properties

PropertyFormulaExample (df=5)
Meandf5
Variance2 × df10
Modemax(df - 2, 0)3
Skewness√(8/df)1.26

Summary

Use np.random.default_rng().chisquare(df, size) to generate chi-square distributed random samples.

  • The degrees of freedom parameter controls the distribution shape-lower values produce heavily right-skewed distributions, while higher values approach normality.
  • This distribution is essential for hypothesis testing, particularly when comparing observed frequencies to expected frequencies.