For \( Y \in \mathbb{R}^n \), \( X \in \mathbb{R}^{n \times p} \), and \( \beta \in \mathbb{R}^p \), consider a regression model \[ Y = X \beta + \epsilon, \] where \( \epsilon \) has an \( n \)-dimensional multivariate normal distribution with zero mean vector and identity covariance matrix. Let \( I_p \) denote the identity matrix of order \( p \). For \( \lambda>0 \), let \[ \hat{\beta}_n = (X^T X + \lambda I_p)^{-1} X^T Y, \] be an estimator of \( \beta \). Then which of the following options is/are correct?
Step 1: Unbiasedness of \( \hat{\beta}_n \) The estimator \( \hat{\beta}_n = (X^T X + \lambda I_p)^{-1} X^T Y \) is biased because the regularization term \( \lambda I_p \) adds bias. Therefore, \( \hat{\beta}_n \) is not an unbiased estimator of \( \beta \).
Thus, Option (A) is incorrect.
Step 2: Positive Definiteness of \( X^T X + \lambda I_p \)
Since \( X^T X \) is positive semi-definite and \( \lambda I_p \) is a positive definite matrix for \( \lambda>0 \), the matrix \( X^T X + \lambda I_p \) is positive definite.
Thus, Option (B) is correct.
Step 3: Distribution of \( \hat{\beta}_n \)
Since \( \epsilon \sim \mathcal{N}(0, I_n) \), the estimator \( \hat{\beta}_n \) is a linear function of the normally distributed vector \( Y \), and hence \( \hat{\beta}_n \) follows a multivariate normal distribution.
Thus, Option (C) is correct.
Step 4: Variance of \( \hat{\beta}_n \)
The variance of \( \hat{\beta}_n \) is given by: \[ {Var}(\hat{\beta}_n) = (X^T X + \lambda I_p)^{-1}. \]
Thus, Option (D) is correct. Final Answer:
The correct answers are \( \boxed{B, C} \).