List of top Questions asked in GATE Data Science and Artificial Intelligence

Let \( \{ x_1, x_2, \dots, x_n \} \) be a set of linearly independent vectors in \( \mathbb{R}^n \). Let the \( (i,j) \)-th element of matrix \( A \in \mathbb{R}^{n \times n} \) be given by \( A_{ij} = x_i^T x_j \), where \( 1 \leq i, j \leq n \). Which one of the following statements is correct?

Let \( A \in \mathbb{R}^{n \times n} \) be such that \( A^3 = A \). Which one of the following statements is ALWAYS correct?

The naive Bayes classifier is used to solve a two-class classification problem with class-labels \( y_1, y_2 \). Suppose the prior probabilities are \( P(y_1) = \frac{1}{3} \) and \( P(y_2) = \frac{2}{3} \). Assuming a discrete feature space with

\[ P(x | y_1) = \frac{3}{4} \quad \text{and} \quad P(x | y_2) = \frac{1}{4} \]

for a specific feature vector \( x \). The probability of misclassifying \( x \) is: (Round off to two decimal places)

Given data \( \{(-1, 1), (2, -5), (3, 5)\} \) of the form \( (x, y) \), we fit a model \( y = wx \) using linear least-squares regression. The optimal value of \( w \) is:
(Round off to three decimal places)

On a relation named Loan of a bank:

There are three boxes containing white balls and black balls.
Box-1 contains 2 black and 1 white ball.
Box-2 contains 1 black and 2 white balls.
Box-3 contains 3 black and 3 white balls.
In a random experiment, one of these boxes is selected, where the probability of choosing Box-1 is \( \frac{1}{2} \), Box-2 is \( \frac{1}{3} \), and Box-3 is \( \frac{1}{6} \). A ball is drawn at random from the selected box. Given that the ball drawn is white, the probability that it is drawn from Box-2 is:

Let \( C_1 \) and \( C_2 \) be two sets of objects. Let \( D(x, y) \) be a measure of dissimilarity between two objects \( x \) and \( y \). Consider the following definitions of dissimilarity between \( C_1 \) and \( C_2 \): \[ \text{DIS-1}(C_1, C_2) = \max_{x \in C_1, y \in C_2} D(x, y) \] \[ \text{DIS-2}(C_1, C_2) = \min_{x \in C_1, y \in C_2} D(x, y) \] Which of the following statements is/are correct?

Suppose that insertion sort is applied to the array \( [1, 3, 5, 7, 9, 11, x, 15, 13] \) and it takes exactly two swaps to sort the array. Select all possible values of \( x \).

Let \( A = I_n + xx^T \), where \( I_n \) is the \( n \times n \) identity matrix and \( x \in \mathbb{R}^n, x^T x = 1 \). Which of the following options is/are correct?

For which of the following inputs does binary search take time \( O(\log n) \) in the worst case?

Which of the following statements is/are correct in a Bayesian network?

Which of the following statements is/are correct?

Consider two functions \( f: \mathbb{R} \to \mathbb{R} \) and \( g: \mathbb{R} \to (1, \infty) \). Both functions are differentiable at a point \( c \). Which of the following functions is/are ALWAYS differentiable at \( c \)? The symbol \( \cdot \) denotes product and the symbol \( \circ \) denotes composition of functions.

Consider the following Python declarations of two lists.
\[ A = [1, 2, 3] \quad \text{and} \quad B = [4, 5, 6]. \] Which one of the following statements results in \( A = [1, 2, 3, 4, 5, 6] \)?

Consider designing a linear classifier

\[ y = \text{sign}(f(x; w, b)), \quad f(x; w, b) = w^T x + b \]

on a dataset \( D = \{(x_1, y_1), (x_2, y_2), \dots, (x_N, y_N)\} \), where \( x_i \in \mathbb{R}^d \), \( y_i \in \{+1, -1\} \), for \( i = 1, 2, \dots, N \).

Recall that the sign function outputs \( +1 \) if the argument is positive, and \( -1 \) if the argument is non-positive. The parameters \( w \) and \( b \) are updated as per the following training algorithm:

\[ w_{\text{new}} = w_{\text{old}} + y_n x_n, \quad b_{\text{new}} = b_{\text{old}} + y_n \]

whenever \( \text{sign}(f(x_n; w_{\text{old}}, b_{\text{old}})) \neq y_n \).

In other words, whenever the classifier wrongly predicts a sample \( (x_n, y_n) \) from the dataset, \( w_{\text{old}} \) gets updated to \( w_{\text{new}} \), and likewise \( b_{\text{old}} \) gets updated to \( b_{\text{new}} \).

Consider the case \( (x_n, +1) \), where \( f(x_n; w_{\text{old}}, b_{\text{old}}) < 0 \). Then:

Let \( X = aZ + b \), where \( Z \) is a standard normal random variable, and \( a, b \) are two unknown constants. It is given that \[ E[X] = 1, \quad E[(X - E[X]) | Z] = -2, \quad E[(X - E[X])^2] = 4, \] where \( E[X] \) denotes the expectation of random variable \( X \). The values of \( a, b \) are:

Let \( X \) be a continuous random variable whose cumulative distribution function (CDF) \( F_X(x) \), for some \( t \), is given as follows:
\[ F_X(x) = \begin{cases} 0 & \text{if } x \leq t \\ \frac{x - t}{4 - t} & \text{if } t \leq x \leq 4 \\ 1 & \text{if } x \geq 4 \end{cases} \]
If the median of \( X \) is 3, then what is the value of \( t \)?

Consider a hash table of size 10 with indices \( \{0, 1, \dots, 9\} \), with the hash function \[ h(x) = 3x \, (\text{mod} \, 10), \] where linear probing is used to handle collisions. The hash table is initially empty and then the following sequence of keys is inserted into the hash table: 1, 4, 5, 6, 14, 15. The indices where the keys 14 and 15 are stored are, respectively: