Concept:
Gradient Descent is a widely used optimization algorithm in machine learning and deep learning. It helps find the optimal values of model parameters (weights and biases) by minimizing a loss function, thereby improving prediction accuracy.
Step 1: {\color{red}What is Gradient Descent?}
Gradient Descent is an iterative optimization method that:
- Calculates the gradient (slope) of the loss function
- Updates parameters in the opposite direction of the gradient
This ensures movement toward the minimum loss.
Step 2: {\color{red}Basic Update Rule}
The parameter update is given by:
\[
\theta = \theta - \alpha \cdot \nabla J(\theta)
\]
where:
- $\theta$ = model parameters
- $\alpha$ = learning rate
- $\nabla J(\theta)$ = gradient of the loss function
Step 3: {\color{red}Role in Model Optimization}
Gradient Descent helps by:
- Reducing prediction errors
- Finding parameter values that minimize loss
- Improving model accuracy over iterations
Step 4: {\color{red}Learning Rate Importance}
The learning rate controls step size:
- Too large → overshooting the minimum
- Too small → slow convergence
Choosing the right value is critical for optimization.
Step 5: {\color{red}Variants of Gradient Descent}
- Batch Gradient Descent — uses entire dataset
- Stochastic Gradient Descent (SGD) — updates per sample
- Mini-batch Gradient Descent — balance of speed and stability