Skip to content
Go back

Maths-as-Code for Machine Learning

Published:  at  12:00 AM
TL;DR

• ML papers use mathematical notation that maps directly to NumPy/Python operations

• Σ → np.sum(), E[X] → np.mean(), matrix multiplication → @ operator

• This cheat sheet translates common symbols into code you can copy-paste and experiment with

Turn the maths you see in ML papers into programmer-friendly code patterns.


Why “Maths-as-Code”?

ML papers speak in symbols. Engineers speak in loops, arrays, and functions. This guide acts as the Rosetta Stone between the two:

Keep this page open when reading a paper; copy the snippets into a notebook and try them on a tiny dataset.


🧩 1. Scalars, Vectors, Matrices, and Tensors

Mental model: A tensor is just an n-dimensional array. Every ML data structure is one of these.

MathMeaningPython / NumPyML Use
xScalar — single numberx = 3.14Learning rate, bias
𝐱 = [x₁, …, xₙ]Vector — ordered listx = np.array([1,2,3])Features, weights
𝐗 = [[xᵢⱼ]]Matrix — 2D gridX = np.array([[1,2],[3,4]])Weight matrices
𝓧Tensor — n-D arraynp.array([...])Images, embeddings
𝐗ᵀTranspose — swap rows/colsX.TSwitching dimensions
import numpy as np
X = np.array([[1, 2], [3, 4]])
print("Matrix:", X)
print("Transpose:", X.T)
// JavaScript equivalent (using standard arrays)
const X = [[1, 2], [3, 4]];
const XT = X[0].map((_, i) => X.map(row => row[i])); // transpose

🔁 2. Summation, Product, and Averages (Σ, ∏, E[·])

Mental model: Σ means loop and add. E[X] means average (expected value).

MathMeaningCode EquivalentML Use
Σᵢ₌₁ⁿ xᵢSum elementsnp.sum(x)Total loss
E[X] = (1/n) Σ xᵢExpectation / meannp.mean(x)Batch average
∏ᵢ₌₁ⁿ xᵢProductnp.prod(x)Likelihood
Var(X)Variancenp.var(x)Feature scaling
import numpy as np
x = np.array([1, 2, 3, 4])
np.sum(x), np.mean(x), np.var(x)

⚙️ 3. Linear Algebra Operations

Mental model: Matrices transform vectors. Neural networks are chains of transforms + non-linearities.

MathMeaningCode EquivalentML Use
𝐱·𝐲 = Σ xᵢyᵢDot productnp.dot(x, y)Similarity, regression
𝐗𝐘Matrix multiplyX @ Y or np.matmul(X,Y)Forward pass
‖𝐱‖₂ = √Σ xᵢ²L2 normnp.linalg.norm(x)Normalization
𝐈Identity matrixnp.eye(n)Initialization
𝐗⁻¹Inversenp.linalg.inv(X)Solving linear systems
import numpy as np
X = np.array([[1, 2], [3, 4]])
W = np.array([[0.5], [0.2]])
y = X @ W  # matrix multiplication
np.linalg.norm(y)  # L2 norm

📐 4. Calculus: Gradients and Derivatives

Mental model: Gradients point in the direction of steepest increase. We descend in the opposite direction.

MathMeaningCode EquivalentML Use
∂f/∂xPartial derivativegrad[i]How one parameter affects loss
∇f or ∇LGradient (vector of partials)np.gradient(f) or autogradDirection to update weights
∇²f or HHessian (second derivatives)np.gradient(np.gradient(f))Curvature, optimization
# With autograd libraries (PyTorch example)
import torch
x = torch.tensor([2.0], requires_grad=True)
y = x ** 2  # y = x²
y.backward()  # compute gradient
print(x.grad)  # dy/dx = 2x = 4.0

🎲 5. Probability Notation

Mental model: P(A) is a number between 0 and 1. P(A|B) is “probability of A given that B happened.”

MathMeaningCode EquivalentML Use
P(A)Probability of Acount_A / totalPrior probability
P(A|B)Probability of A given BBayes’ rulePosterior, classification
P(A,B)Joint probabilityP(A) * P(B|A)Likelihood
𝔼[X]Expected valuenp.mean(x)Average outcome
argmax P(y|x)Most likely classnp.argmax(probs)Prediction
import numpy as np
# Softmax: convert logits to probabilities
def softmax(logits):
    exp_logits = np.exp(logits - np.max(logits))  # numerical stability
    return exp_logits / np.sum(exp_logits)

logits = np.array([2.0, 1.0, 0.1])
probs = softmax(logits)
prediction = np.argmax(probs)  # argmax

🎯 6. Common ML Symbols

Mental model: These symbols appear constantly — knowing them speeds up paper reading.

SymbolMeaningTypical Use
θ (theta)Model parametersWeights and biases collectively
L or JLoss functionWhat we minimize during training
α or ηLearning rateStep size for gradient descent
λRegularization strengthControls overfitting penalty
ŷ (y-hat)PredictionModel’s output
yTrue labelGround truth
ε (epsilon)Small constantNumerical stability (e.g., 1e-8)
# Gradient descent in one line
theta = theta - alpha * gradient  # θ ← θ - α∇L

🧪 Quick Reference Card

Notation      →  NumPy/Python
─────────────────────────────────────
Σ xᵢ          →  np.sum(x)
∏ xᵢ          →  np.prod(x)
E[X]          →  np.mean(x)
Var(X)        →  np.var(x)
𝐗ᵀ            →  X.T
𝐗𝐘            →  X @ Y
‖𝐱‖           →  np.linalg.norm(x)
argmax        →  np.argmax(x)
∇L            →  loss.backward() (autograd)
θ ← θ - α∇L   →  theta -= alpha * grad

What’s Next?

This cheat sheet covers the most common notation. For deeper dives:

Keep this page open when reading papers — copy the snippets into a notebook and experiment with small examples.


Suggest Changes

Previous Post
An Intuitive Journey Through Statistics & Hypothesis Testing
Next Post
Agents, Routing, Patterns, and Actors