Concept:
Data normalization is a feature scaling technique used to bring different numerical variables onto a similar scale. Since real-world datasets often contain features with varying units and magnitudes, normalization ensures fair contribution during model training.
Step 1: {\color{red}What is Data Normalization?}
Normalization rescales values to a fixed range, typically:
- Between 0 and 1 (Min-Max normalization)
- Sometimes between -1 and 1
It preserves relationships while changing scale.
Step 2: {\color{red}Common Normalization Formula}
Min-Max normalization is given by:
\[
x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}}
\]
where:
- $x$ = original value
- $x'$ = normalized value
Step 3: {\color{red}Why Feature Scaling is Needed}
Different feature scales can cause issues:
- Large-value features dominate smaller ones
- Slower convergence in optimization algorithms
Step 4: {\color{red}Benefits of Normalization}
Normalization helps:
- Improve training speed
- Ensure equal feature contribution
- Enhance performance of distance-based algorithms (e.g., KNN, clustering)
Step 5: {\color{red}When to Use Normalization}
It is especially useful for:
- Gradient descent-based models
- Neural networks
- Algorithms sensitive to scale