- Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?
- Covariance and Correlation
- When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?
- Target Encoding for Categorical Features
- How will you build an auto suggestion feature for a messaging app or google search?
- What is the Page Rank Algorithm ?
- Bayesian Neural Networks
- You are given some documents and asked to find prevalent topics in the documents – how do you go about it ?
- Evaluation Metrics for Recommendation Systems
- What is Stacking ? Ensembling Multiple Dissimilar Models
- What is One-Class SVM ? How to use it for anomaly detection?
- How many parameters are there for an hMM model?
- What are evaluation metrics for multi-class classification problem (like positive/negative/neutral sentiment analysis)
- Dartboard Paradox: Probability Density Function vs Probability
- Can you give an example of a classifier with high bias and high variance?
- How do you design a system that reads a natural language question and retrieves the closest FAQ answer?
- Is the run-time of an ML algorithm important? How do I evaluate whether the run-time is OK?
- What are the advantages and disadvantages of using naive bayes for spam detection?
- How do you train a hMM model in practice ?
- What is the complexity of Viterbi algorithm ?
- What is overfitting and underfitting ? Give examples. How do you overcome them?
- Recursive Feature Elimination for Feature Selection
- Bias in Machine Learning : How to measure Fairness based on Confusion Matrix ?
- How can you increase the recall of a search query (on search engine or e-commerce site) result without changing the algorithm ?
- What is PMI ?
- What is the Maximum Likelihood Estimate (MLE)?
- What would you care more about – precision or recall for spam filtering problem?
- What are the different independence assumptions in hMM & Naive Bayes ?
- What are some common tools available for NER ? Named Entity Recognition ?
- Bias in Machine Learning : Types of Data Biases
- What is stratified sampling and why is it important ?
- What is the difference between word2Vec and Glove ?
- I have designed a 2 layered deep neural network for a classifier with 2 units in the hidden layer. I use linear activation functions with a sigmoid at the final layer. I use a data visualization tool and see that the decision boundary is in the shape of a sine curve. I have tried to train with 200 data points with known class labels and see that the training error is too high. What do I do ?
- You want to find food related topics in twitter – how do you go about it ?
- Detecting and Removing Gender Bias in Word Embeddings
- I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?
- What are Isolation Forests? How to use them for Anomaly Detection?
- How to measure the performance of the language model ?
- What are knowledge graphs? When would you need a knowledge graph over say a database to store information?
- What is the difference between supervised and unsupervised learning ?
- What are the different ways of preventing over-fitting in a deep neural network ? Explain the intuition behind each
- Can we use the AUC Metric for a SVM Classifier ?
- What is Simpsons Paradox ?
- Explain Locality Sensitive Hashing for Nearest Neighbour Search ?
- What is Bayesian Logistic Regression?
- How do you generate text using a Hidden Markov Model (HMM) ?
- If the average length of a sentence is 100 in all documents, should we build 100-gram language model ?
- Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?
- What are some knowledge graphs you know. What is different between these ?
- Do we need to learn Linear Algebra for Machine Learning ?
- What are the different ways of representing documents ?
- What is speaker segmentation in speech recognition ? How do you use it ?
- With the maximum likelihood estimate are we guaranteed to find a global Optima ?
- How to find the Optimal Number of Clusters in K-means? Elbow and Silhouette Methods
- What is AUC : Area Under the Curve?
- Naive Bayes Classifier : Advantages and Disadvantages
- Learning Feature Importance from Decision Trees and Random Forests
- What are the commonly used activation functions ? When are they used.
- You have come up with a Spam classifier. How do you measure accuracy ?
- What is negative sampling when training the skip-gram model ?
- What is Elastic Net Regularization for Regression?
- Why do you typically see overflow and underflow when implementing an ML algorithms ?
- What is Bayesian Modeling?
- What are popular ways of dimensionality reduction in NLP tasks ? Do you think this is even important ?
- How do you handle missing data in an ML algorithm ?
- What is the difference between deep learning and machine learning?
- You are building a natural language search box for a website. How do you accommodate spelling errors?
- How do you deal with dataset imbalance in a problem like spam filtering ?
- How do you measure quality of Machine translation ?
- Berkson’s Paradox
- How does KNN algorithm work ? What are the advantages and disadvantages of KNN ?