Python NumPy: How to Generate Random Numbers from the Normal Distribution Using NumPy
The normal distribution (also called the Gaussian distribution or bell curve) is one of the most important probability distributions in statistics and data science. It describes data that clusters around a mean value, with values becoming less frequent the further they are from the center, creating the characteristic bell-shaped curve.
NumPy's numpy.random.normal() function makes it easy to generate random numbers that follow this distribution, which is essential for simulations, statistical testing, machine learning model initialization, and synthetic data generation.
Basic Usage
The simplest way to generate random numbers from a normal distribution is calling np.random.normal() with the desired size:
import numpy as np
# Generate 5 random numbers from the standard normal distribution
values = np.random.normal(size=5)
print(values)
Output (varies each run):
[ 0.86153799 1.87815094 -0.49538872 0.21429525 -0.11574017]
By default, this generates numbers from the standard normal distribution - a normal distribution with a mean of 0 and a standard deviation of 1.
Syntax and Parameters
numpy.random.normal(loc=0.0, scale=1.0, size=None)
| Parameter | Type | Default | Description |
|---|---|---|---|
loc | float | 0.0 | Mean (center) of the distribution |
scale | float | 1.0 | Standard deviation (spread) of the distribution |
size | int or tuple | None | Output shape. None returns a single value |
Returns: A float (if size=None) or a NumPy array of random samples drawn from the specified normal distribution.
Specifying Mean and Standard Deviation
Customize the distribution by setting loc (mean) and scale (standard deviation):
import numpy as np
# Mean = 50, Standard deviation = 10
values = np.random.normal(loc=50, scale=10, size=5)
print("Random values:", np.round(values, 2))
Output:
Random values: [49.84 33.51 58.29 39.35 45.46]
The generated numbers cluster around 50 (the mean), with most values falling within 10 units (one standard deviation) of the mean.
loc and scaleloc(mean) - shifts the center of the distribution. Changing it from 0 to 100 shifts all values to center around 100.scale(standard deviation) - controls the spread. A largerscaleproduces values that are more spread out; a smallerscalekeeps values tighter around the mean.
In a normal distribution:
- ~68% of values fall within 1 standard deviation of the mean
- ~95% fall within 2 standard deviations
- ~99.7% fall within 3 standard deviations
Generating Multi-Dimensional Arrays
Pass a tuple to size to generate 2D or higher-dimensional arrays:
2D Array (Matrix)
import numpy as np
# 3 rows × 4 columns of normally distributed values
matrix = np.random.normal(loc=0, scale=1, size=(3, 4))
print(np.round(matrix, 3))
Output:
[[-0.344 -0.155 0.924 -0.905]
[ 1.576 1.588 0.464 0.309]
[ 0.403 0.172 1.531 1.002]]
3D Array
import numpy as np
# 2 layers × 3 rows × 4 columns
arr_3d = np.random.normal(loc=0, scale=1, size=(2, 3, 4))
print("Shape:", arr_3d.shape)
Output:
Shape: (2, 3, 4)
Reproducible Results with Random Seeds
For consistent, reproducible results (important in testing and research), set a random seed:
import numpy as np
np.random.seed(42)
values = np.random.normal(loc=0, scale=1, size=5)
print("Run 1:", np.round(values, 4))
# Same seed produces identical output
np.random.seed(42)
values = np.random.normal(loc=0, scale=1, size=5)
print("Run 2:", np.round(values, 4))
Output:
Run 1: [ 0.4967 -0.1383 0.6477 1.523 -0.2342]
Run 2: [ 0.4967 -0.1383 0.6477 1.523 -0.2342]
NumPy's newer Generator API (NumPy 1.17+) is recommended over the legacy np.random functions:
import numpy as np
rng = np.random.default_rng(seed=42)
values = rng.normal(loc=0, scale=1, size=5)
print(np.round(values, 4))
Output:
[ 0.3047 -1.04 0.7505 0.9406 -1.951 ]
The Generator API provides better statistical properties and is thread-safe.
Verifying the Distribution
You can verify that the generated numbers actually follow a normal distribution by checking the sample statistics:
import numpy as np
# Generate a large sample
rng = np.random.default_rng(42)
samples = rng.normal(loc=50, scale=10, size=100000)
print(f"Expected mean: 50, Sample mean: {samples.mean():.2f}")
print(f"Expected std: 10, Sample std: {samples.std():.2f}")
print(f"Min: {samples.min():.2f}")
print(f"Max: {samples.max():.2f}")
# Verify the 68-95-99.7 rule
within_1_std = np.sum(np.abs(samples - 50) <= 10) / len(samples) * 100
within_2_std = np.sum(np.abs(samples - 50) <= 20) / len(samples) * 100
within_3_std = np.sum(np.abs(samples - 50) <= 30) / len(samples) * 100
print(f"\nWithin 1 std: {within_1_std:.1f}% (expected ~68.3%)")
print(f"Within 2 std: {within_2_std:.1f}% (expected ~95.4%)")
print(f"Within 3 std: {within_3_std:.1f}% (expected ~99.7%)")
Output:
Expected mean: 50, Sample mean: 49.96
Expected std: 10, Sample std: 10.04
Min: 6.11
Max: 100.07
Within 1 std: 68.2% (expected ~68.3%)
Within 2 std: 95.4% (expected ~95.4%)
Within 3 std: 99.7% (expected ~99.7%)
Practical Examples
Simulating Test Scores
import numpy as np
rng = np.random.default_rng(42)
# Simulate test scores: mean = 75, std = 12
scores = rng.normal(loc=75, scale=12, size=10)
# Clip to valid range [0, 100]
scores = np.clip(scores, 0, 100)
print("Simulated test scores:", np.round(scores, 1))
print(f"Class average: {scores.mean():.1f}")
Output:
Simulated test scores: [78.7 62.5 84. 86.3 51.6 59.4 76.5 71.2 74.8 64.8]
Class average: 71.0
Generating Synthetic Height Data
import numpy as np
rng = np.random.default_rng(42)
# Adult male heights in cm: mean = 175.3, std = 7.1
heights = rng.normal(loc=175.3, scale=7.1, size=5)
for i, h in enumerate(heights, 1):
print(f"Person {i}: {h:.1f} cm")
Output:
Person 1: 177.5 cm
Person 2: 167.9 cm
Person 3: 180.6 cm
Person 4: 182.0 cm
Person 5: 161.4 cm
Normal Distribution vs Other Distributions
| Function | Distribution | Shape | Use Case |
|---|---|---|---|
np.random.normal() | Normal (Gaussian) | Bell curve | Most natural phenomena |
np.random.uniform() | Uniform | Flat | Equal probability across range |
np.random.exponential() | Exponential | Right-skewed | Time between events |
np.random.poisson() | Poisson | Discrete | Count of events in fixed interval |
np.random.binomial() | Binomial | Discrete | Success/failure trials |
Conclusion
numpy.random.normal() is a versatile function for generating random numbers that follow the bell curve distribution.
Use the loc parameter to set the mean, scale to control the spread, and size to define the output shape.
For reproducible results, always set a random seed, preferably using the modern np.random.default_rng() API.
Whether you're running Monte Carlo simulations, initializing neural network weights, or generating synthetic datasets, understanding how to generate normally distributed data is a fundamental skill in scientific Python programming.