How to Calculate Standard Deviation of a Matrix in Python
Understanding data variability is fundamental to statistical analysis and machine learning. Calculating the standard deviation of a matrix helps you measure how spread out values are from the mean, enabling better feature engineering, outlier detection, and data normalization. In this guide, you'll learn how to efficiently compute standard deviation across entire matrices, rows, and columns using NumPy, the Python's most powerful numerical computing library.
Global Standard Deviation
By default, the np.std() function treats the entire matrix as a single list of numbers and returns a single "global" standard deviation value.
import numpy as np
# Define a 3x3 matrix
matrix = np.array([
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
])
# Calculate SD for the entire dataset
global_sd = np.std(matrix)
print(f"Global Standard Deviation: {global_sd:.2f}")
Output:
Global Standard Deviation: 25.82
Row-wise and Column-wise Calculations
For advanced feature engineering, you often need to analyze the variance across specific dimensions. You can control this using the axis parameter:
axis=0: Calculates the standard deviation of each columnaxis=1: Calculates the standard deviation of each row
import numpy as np
# Define a 3x3 matrix
matrix = np.array([
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
])
# Calculate dispersion for each vertical column
col_sd = np.std(matrix, axis=0)
# Calculate dispersion for each horizontal row
row_sd = np.std(matrix, axis=1)
print(f"Column Standard Deviations: {col_sd}")
print(f"Row Standard Deviations: {row_sd}")
Output:
Column Standard Deviations: [24.49489743 24.49489743 24.49489743]
Row Standard Deviations: [8.16496581 8.16496581 8.16496581]
NumPy defaults to the population standard deviation formula. If your matrix represents a sample (subset) of a larger dataset, set ddof=1 (Delta Degrees of Freedom) to apply Bessel's correction for an unbiased estimate: np.std(matrix, ddof=1).
Understanding the Axis Parameter
Thinking about axes can be confusing at first. Use this mental model:
- Axis 0: Operations move down the rows, collapsing them into a single row of results
- Axis 1: Operations move across the columns, collapsing them into a single column of results
| Goal | Command | Output Shape |
|---|---|---|
| All elements | np.std(matrix) | Scalar |
| Per column | np.std(matrix, axis=0) | 1D array (length = number of columns) |
| Per row | np.std(matrix, axis=1) | 1D array (length = number of rows) |
If your matrix contains missing values (NaN), the standard np.std() function will return NaN. Use np.nanstd() instead to calculate the standard deviation while ignoring missing values:
matrix_with_nan = np.array([[1, 2, np.nan], [4, 5, 6]])
clean_sd = np.nanstd(matrix_with_nan)
By mastering axis-aware calculations in NumPy, you can extract precise variability metrics from any multi-dimensional dataset with minimal code.