Gradient Descent & Its Variants
Published: March 19, 2026 • Tags: Deep Learning, Optimization, Calculus
Gradient Descent is a first-order iterative optimization algorithm used to find the local minimum of a differentiable function. In the context of Machine Learning and Deep Learning, that "function" is the Loss Function (e.g., Mean Squared Error or Cross-Entropy Loss).
Imagine standing blindfolded on a rugged mountain and trying to get to the very bottom. You feel the slope of the ground beneath your feet with your toes and take a step straight downhill. Gradient descent does exactly this using calculus: it calculates the derivative (the slope) of the loss function and updates the model's weights in the opposite direction of the gradient.
1. Batch Gradient Descent (BGD)
Also known as standard Gradient Descent. In BGD, the model calculates the error for the entire training dataset before making a single update to the weights. It takes very slow, extremely calculated, and stable steps toward the minimum.
- Pros: Guaranteed to converge to the global minimum for convex error surfaces.
- Cons: Extremely slow and memory-intensive for large datasets.
2. Stochastic Gradient Descent (SGD)
In stark contrast to BGD, Stochastic Gradient Descent takes a single, randomly chosen data point from the dataset, calculates the error, and immediately updates the weights. It repeats this for every single data point.
- Pros: Much faster iterations. The erratic updates can actually help the algorithm "jump out" of bad local minima traps.
- Cons: Highly noisy and erratic path to convergence. It never actually settles precisely at the minimum.
3. Mini-Batch Gradient Descent
This is the industry standard deep learning approach. Mini-batch strikes a compromise: it takes a small batch of data points (e.g., 32, 64, or 256 samples), averages the gradients of that batch, and updates the weights.
- Pros: Reaps the hardware optimization benefits of matrix matrix multiplications (GPUs love batches of 32/64). It is faster than BGD and more stable than SGD.