Machine Learning – Page 12 – Machine Learning Interviews

What are the commonly used activation functions ? When are they used.

Posted on February 14, 2019 by MLNerds

Ans. The commonly used loss functions are Linear : g(x) = x. This is the simplest activation function. However it cannot model complex decision boundaries. A deep network with linear activations can be shown incapable of handling non-linear decision boundaries. Sigmoid : This is a common activation function in the last layer of the neural…

I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?

Posted on February 14, 2019February 14, 2019 by MLNerds

Increase the number of training iterations Make a more complex network – increase hidden layer size Initialize weights to a random small value instead of zeros Change tanh activations to relu Ans : (3) . I will initialize weights to a non zero value since changing all the weights in the same…

What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each

Posted on February 14, 2019February 14, 2019 by MLNerds

L2 norm regularization : Make the weights closer to zero prevent overfitting. L1 Norm regularization : Make the weights closer to zero and also induce sparsity in weights. Less common form of regularization Dropout regularization : Ensure some of the hidden units are dropped out at random to ensure the network does not overfit by…

I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?

Posted on February 14, 2019February 22, 2019 by MLNerds

Increase number of units in the hidden layer Increase number of hidden layers Increase data set size Change activation function to tanh Try all of the above The answer is d. When I use a linear activation function, the deep neural network is realizing a linear combination of linear functions which leads to modeling only…

Can you give an example of a classifier with high bias and high variance?

Posted on February 14, 2019February 21, 2019 by MLNerds

High bias means the data is being underfit. The decision boundary is not usually complex enough. High variance happens due to over fitting, the decision boundary is more complex than what it should be. High bias high variance happens when you fit a complex decision boundary that is also not fitting the training set…

How does KNN algorithm work ? What are the advantages and disadvantages of KNN ?

Posted on February 14, 2019October 26, 2020 by MLNerds

The KNN algorithm is commonly used in many ML applications – right from supervised settings such as classification and regression, to just retrieving similar items in applications such as recommendation systems, search, question answering and so on. What is the KNN Algorithm? KNN for Nearest Neighbour Search: KNN algorithm involves retrieving the K datapoints that are…

You are given some documents and asked to find prevalent topics in the documents – how do you go about it ?

Posted on February 14, 2019February 14, 2019 by MLNerds

This is typically called topic modeling. Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. For instance, two statements – about meals and about food can probably characterized by the same topic though they do not necessarily use the same vocabulary. Topic models typically…

What is speaker segmentation in speech recognition ? How do you use it ?

Posted on February 14, 2019February 14, 2019 by MLNerds

Speaker diarization or speaker segmentation is the process of automatically assigning a speaker identity to each segment of the audio file. Segmenting by speaker is very useful in several applications to understand who said what in a conversation. Typically speaker information is crucial for applications such as emotion detection, behavioural analysis or topic analysis of…

What are some common tools available for NER ? Named Entity Recognition ?

Posted on February 14, 2019February 14, 2019 by MLNerds

Notable NER platforms include: GATE supports NER across many languages and domains out of the box, usable via a graphical interface and a Java API. OpenNLP includes rule-based and statistical named-entity recognition. SpaCy features fast statistical NER as well as an open-source named-entity visualizer.

What is the difference between word2Vec and Glove ?

Posted on February 14, 2019October 26, 2020 by MLNerds

Both word2vec and glove enable us to represent a word in the form of a vector (often called embedding). They are the two most popular algorithms for word embeddings that bring out the semantic similarity of words that captures different facets of the meaning of a word. They are used in many NLP applications such as sentiment…

← Newer posts Older posts →