What is the difference between stemming and lemmatisation?

Stemming is about replacing each word with its origin stem word in order to remove the suffixes like “es”, “ies”, “s”. For ex., “cats” => “cat”, “computers” => “computer” etc. This is more of a heuristic approach and not using any grammar or dictionary. Lemmatisation has the same purpose as above but doing it properly…

What are the optimization algorithms typically used in a neural network ?

Gradient descent is the most commonly used training algorithm. Momentum is a common way to augment gradient descent such that gradient in each step is accumulated over past steps to enable the algorithm to proceed in a smoother fashion towards the minimum.  RMS prop attempts to adjust learning rate for each iteration in an automated…

I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?

  Increase the number of training iterations Make a more complex network – increase hidden layer size Initialize weights to a random small value instead of zeros Change tanh activations to relu     Ans : (3) . I will initialize weights to a non zero value since changing all the weights in the same…

What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each

L2 norm regularization : Make the weights closer to zero prevent overfitting. L1 Norm regularization : Make the weights closer to zero and also induce sparsity in weights. Less common form of regularization Dropout regularization : Ensure some of the hidden units are dropped out at random to ensure the network does not overfit by…

I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?

Increase number of units in the hidden layer Increase number of hidden layers  Increase data set size Change activation function to tanh Try all of the above The answer is d. When I use a linear activation function, the deep neural network is realizing a linear combination of linear  functions which leads to modeling only…

How does KNN algorithm work ? What are the advantages and disadvantages of KNN ?

The KNN algorithm is commonly used in many ML applications – right from supervised settings such as classification and regression, to just retrieving similar items in applications such as recommendation systems, search, question answering and so on. What is the KNN Algorithm? KNN for Nearest Neighbour Search: KNN algorithm involves retrieving the K datapoints that are…