Python NumPy: How to Generate Chi-Square Distributions in NumPy

The chi-square (χ²) distribution is fundamental in statistical testing, particularly for goodness-of-fit tests, independence tests, and confidence interval estimation. NumPy provides efficient methods to generate random samples from this distribution.

Using the Modern Generator API

The recommended approach uses NumPy's modern random generator:

import numpy as np

rng = np.random.default_rng(seed=42)  # Optional seed for reproducibility

# Generate 5 samples with 3 degrees of freedom
samples = rng.chisquare(df=3, size=5)
print(samples)

Output:

[3.05543949 4.35892866 2.6205389  2.29722655 4.79524961]

Understanding Degrees of Freedom

The df parameter (degrees of freedom) controls the distribution's shape:

import numpy as np

rng = np.random.default_rng()

# Low df: heavily right-skewed
low_df = rng.chisquare(df=1, size=5)
print(f"df=1: mean={low_df.mean():.2f}")

# Medium df: moderately skewed
mid_df = rng.chisquare(df=5, size=5)
print(f"df=5: mean={mid_df.mean():.2f}")

# High df: approaches normal distribution
high_df = rng.chisquare(df=30, size=5)
print(f"df=30: mean={high_df.mean():.2f}")

Output:

df=1: mean=0.42
df=5: mean=5.07
df=30: mean=28.63

note

The mean of a chi-square distribution equals its degrees of freedom (df), and the variance equals 2×df. As df increases, the distribution becomes more symmetric.

Multi-dimensional Output

Generate matrices or higher-dimensional arrays of chi-square samples:

import numpy as np

rng = np.random.default_rng()

# 3x3 matrix
matrix = rng.chisquare(df=5, size=(3, 3))
print(matrix.shape)  # (3, 3)
print(matrix)

# 3D array: 2 layers, 3 rows, 4 columns
tensor = rng.chisquare(df=10, size=(2, 3, 4))
print(tensor.shape)  # (2, 3, 4)

Output:

(3, 3)
[[ 2.68470397  4.50259988  3.39465421]
 [ 5.0783715   1.16429817  3.38975043]
 [ 5.70560665  1.39477394 14.24344514]]
(2, 3, 4)

Visualizing Different Degrees of Freedom

Compare how the distribution shape changes with different df values:

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=42)

fig, ax = plt.subplots(figsize=(10, 6))

degrees = [1, 2, 5, 10, 20]
colors = ['red', 'blue', 'green', 'orange', 'purple']

for df, color in zip(degrees, colors):
    samples = rng.chisquare(df=df, size=10000)
    ax.hist(samples, bins=100, density=True, alpha=0.5,
            histtype='stepfilled', label=f'df={df}', color=color)

ax.set_xlabel('Value')
ax.set_ylabel('Density')
ax.set_title('Chi-Square Distribution for Various Degrees of Freedom')
ax.legend()
ax.set_xlim(0, 40)
plt.show()

Statistical Properties

Verify theoretical properties with samples:

import numpy as np

rng = np.random.default_rng()

df = 10
samples = rng.chisquare(df=df, size=100000)

print(f"Degrees of freedom: {df}")
print(f"Theoretical mean: {df}")
print(f"Sample mean: {samples.mean():.2f}")
print(f"Theoretical variance: {2 * df}")
print(f"Sample variance: {samples.var():.2f}")

Output:

Degrees of freedom: 10
Theoretical mean: 10
Sample mean: 9.99
Theoretical variance: 20
Sample variance: 19.82

Practical Example: Chi-Square Test Simulation

Simulate the chi-square test statistic under the null hypothesis:

import numpy as np

def simulate_chi_square_test(observed, expected, n_simulations=10000):
    """Simulate chi-square test statistic distribution."""
    rng = np.random.default_rng()
    
    # Calculate observed test statistic
    observed_stat = np.sum((observed - expected)**2 / expected)
    
    # Degrees of freedom
    df = len(observed) - 1
    
    # Simulate null distribution
    null_distribution = rng.chisquare(df=df, size=n_simulations)
    
    # Calculate p-value
    p_value = np.mean(null_distribution >= observed_stat)
    
    return observed_stat, p_value

# Example: Testing if a die is fair
observed = np.array([18, 22, 15, 20, 25, 20])  # Observed counts
expected = np.array([20, 20, 20, 20, 20, 20])  # Expected if fair

stat, p_val = simulate_chi_square_test(observed, expected)
print(f"Chi-square statistic: {stat:.2f}")
print(f"Simulated p-value: {p_val:.4f}")

Output:

Chi-square statistic: 2.90
Simulated p-value: 0.7184

Legacy API (For Reference)

The older API still works but is not recommended for new code:

import numpy as np

# Legacy method (avoid in new code)
np.random.seed(42)
samples = np.random.chisquare(df=5, size=10)

# Modern equivalent (preferred)
rng = np.random.default_rng(42)
samples = rng.chisquare(df=5, size=10)

tip

The modern default_rng() API provides better statistical properties, is faster, and supports independent random streams for parallel computing.

Combining with SciPy for Full Analysis

For complete statistical analysis, combine NumPy sampling with SciPy:

import numpy as np
from scipy import stats

rng = np.random.default_rng()

df = 5
samples = rng.chisquare(df=df, size=10000)

# Compare with theoretical distribution
x = np.linspace(0, 20, 100)
theoretical_pdf = stats.chi2.pdf(x, df)

# Calculate percentiles
p95 = stats.chi2.ppf(0.95, df)
print(f"95th percentile (theoretical): {p95:.2f}")
print(f"95th percentile (sample): {np.percentile(samples, 95):.2f}")

Output:

95th percentile (theoretical): 11.07
95th percentile (sample): 11.06

Parameters Reference

Parameter	Description	Valid Values
`df`	Degrees of freedom	Positive number (> 0)
`size`	Output shape	int, tuple, or None

Key Properties

Property	Formula	Example (df=5)
Mean	df	5
Variance	2 × df	10
Mode	max(df - 2, 0)	3
Skewness	√(8/df)	1.26

Summary

Use np.random.default_rng().chisquare(df, size) to generate chi-square distributed random samples.

The degrees of freedom parameter controls the distribution shape-lower values produce heavily right-skewed distributions, while higher values approach normality.
This distribution is essential for hypothesis testing, particularly when comparing observed frequencies to expected frequencies.

Using the Modern Generator API​

Understanding Degrees of Freedom​

Multi-dimensional Output​

Visualizing Different Degrees of Freedom​

Statistical Properties​

Practical Example: Chi-Square Test Simulation​

Legacy API (For Reference)​

Combining with SciPy for Full Analysis​

Parameters Reference​

Key Properties​

Summary​

Table of Contents

Using the Modern Generator API

Understanding Degrees of Freedom

Multi-dimensional Output

Visualizing Different Degrees of Freedom

Statistical Properties

Practical Example: Chi-Square Test Simulation

Legacy API (For Reference)

Combining with SciPy for Full Analysis

Parameters Reference

Key Properties

Summary