Consider designing a linear classifier
\[
y = \text{sign}(f(x; w, b)), \quad f(x; w, b) = w^T x + b
\]
on a dataset \( D = \{(x_1, y_1), (x_2, y_2), \dots, (x_N, y_N)\} \),
where \( x_i \in \mathbb{R}^d \), \( y_i \in \{+1, -1\} \), for \( i = 1, 2, \dots, N \).
Recall that the sign function outputs \( +1 \) if the argument is positive, and \( -1 \)
if the argument is non-positive. The parameters \( w \) and \( b \) are updated as per the following training algorithm:
\[
w_{\text{new}} = w_{\text{old}} + y_n x_n, \quad b_{\text{new}} = b_{\text{old}} + y_n
\]
whenever \( \text{sign}(f(x_n; w_{\text{old}}, b_{\text{old}})) \neq y_n \).
In other words, whenever the classifier wrongly predicts a sample \( (x_n, y_n) \) from the dataset, \( w_{\text{old}} \) gets updated to \( w_{\text{new}} \),
and likewise \( b_{\text{old}} \) gets updated to \( b_{\text{new}} \).
Consider the case \( (x_n, +1) \), where \( f(x_n; w_{\text{old}}, b_{\text{old}}) < 0 \). Then: