Machine Learning Interview Questions - Ace the Data Science Interview!

Bayesian Neural Networks
How will you build an auto suggestion feature for a messaging app or google search?
What is speaker segmentation in speech recognition ? How do you use it ?
How to find the Optimal Number of Clusters in K-means? Elbow and Silhouette Methods
What are Isolation Forests? How to use them for Anomaly Detection?
Learning Feature Importance from Decision Trees and Random Forests
Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?
You are given some documents and asked to find prevalent topics in the documents – how do you go about it ?
What is the Page Rank Algorithm ?
Detecting and Removing Gender Bias in Word Embeddings
With the maximum likelihood estimate are we guaranteed to find a global Optima ?
When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?
What is the difference between supervised and unsupervised learning ?
Is the run-time of an ML algorithm important? How do I evaluate whether the run-time is OK?
What is Bayesian Logistic Regression?
Berkson’s Paradox
What is Stacking ? Ensembling Multiple Dissimilar Models
What is negative sampling when training the skip-gram model ?
How can you increase the recall of a search query (on search engine or e-commerce site) result without changing the algorithm ?
How many parameters are there for an hMM model?
What are knowledge graphs? When would you need a knowledge graph over say a database to store information?
How does KNN algorithm work ? What are the advantages and disadvantages of KNN ?
I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples  (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?
What are some common tools available for NER ? Named Entity Recognition ?
What are evaluation metrics for multi-class classification problem (like positive/negative/neutral sentiment analysis)
Recursive Feature Elimination for Feature Selection
What is PMI ?
How do you deal with dataset imbalance in a problem like spam filtering ?
How do you generate text using a Hidden Markov Model (HMM) ?
You have come up with a Spam classifier. How do you measure accuracy ?
How do you measure quality of Machine translation ?
Do we need to learn Linear Algebra for Machine Learning ?
What is the difference between deep learning and machine learning?
Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?
What would you care more about – precision or recall for spam filtering problem?
I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?
Can you give an example of a classifier with high bias and high variance?
Can we use the AUC Metric for a SVM Classifier ? 
What is the complexity of Viterbi algorithm ?
Why do you typically see overflow and underflow when implementing an ML algorithms ?
What are the different ways of representing documents ?
How do you handle missing data in an ML algorithm ?
What are some knowledge graphs you know. What is different between these ?
Bias in Machine Learning : How to measure Fairness based on Confusion Matrix ?
What is Elastic Net Regularization for Regression?
What is stratified sampling and why is it important ?
What are the different independence assumptions in hMM & Naive Bayes ?
What is the difference between word2Vec and Glove ?
Dartboard Paradox: Probability Density Function vs Probability
How to measure the performance of the language model ?
What is Bayesian Modeling?
What are the advantages and disadvantages of using naive bayes for spam detection?
What is the Maximum Likelihood Estimate (MLE)?
How do you design a system that reads a natural language question and retrieves the closest FAQ answer?
What is Simpsons Paradox ?
Covariance and Correlation
What are popular ways of dimensionality reduction in NLP tasks ? Do you think this is even important ?
You are building a natural language search box for a website. How do you accommodate spelling errors?
You want to find food related topics in twitter – how do you go about it ?
If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?
What are the commonly used activation functions ? When are they used.
What is One-Class SVM ? How to use it for anomaly detection?
What is overfitting and underfitting ? Give examples. How do you overcome them?
Explain Locality Sensitive Hashing for Nearest Neighbour Search ?
What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each
How do you train a hMM model in practice ?