Whether you are kickstarting your interview preparation, or wrapping up your preparation and are looking for final touches, here are over 50 must see questions to prepare for a data science interview. We have put them in five categories for convenience. (Note: There are sevaral more questions along with answers in the main menu “Interview Questions and Answers” )
Basic Data Science Questions
How do you evaluate a Machine Learning algorithm ?
Why do you need training set, test set and validation set ?
What is bias variance trade-off in Machine Learning ?
What is the difference between supervised and unsupervised learning ?
When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?
What are popular clustering algorithms? How do you fix the number of clusters in a clustering algorithm ?
Why do you need dimensionality reduction ? What are some ways to do this ?
What is regularization ? What types of regularizers do you know ?
What are the various steps in the typical Machine Learning pipeline ?
What data cleanup and normalization is required in a typical Machine Learning pipeline ?
What is overfitting and underfitting ? Give examples. How do you overcome them?
Basic Data Cleanup/wrangling Questions
How do you deal with missing data ?
How do you detect outliers in data ? How do you deal with them ?
What pre-processing can you do when you have an imbalanced dataset ?
When and how do you do feature scaling ?
When do you need to normalize your data to have zero mean and unit variance ?
What do you do when you have very little training data ?
You realize you have duplicates in your data – what do you do ?
You have 10,000 features. How do you figure out if you need all of them.
You have a column that contains colors such as “red”, “blue”,… how do you handle this column ?
You have one file with person ID, eye color, ethnicity, height, weight in one file and person ID, salary, family size in another file. How do you make a combined file with these in pandas ?
Basic Deep Learning Questions
What is machine learning and where does deep learning fit in ?
What are the different loss functions you can use in Deep Learning ? How do you pick one ?
What is drop out ?
What are the different forms of regularizers used in deep learning ?
What are the learning algorithms you are aware of ?
How do you typically initialize weights in deep neural network ?
How do you fix the number of layers & hidden units in a deep neural network ?
What are the differences between keras, tensorflow and pytorch ?
What is the purpose of an LSTM ? Why do you need a bi-directional LSTM ?
What is the difference between a GRU and a bi-directional LSTM ?
What is an attention mechanism ? What are some examples where it is used ?
Basic NLP Questions
What are stop words ? How do we remove them ?
What are various ways of finding word embeddings ?
Explain the skip-gram model and word2vec embeddings?
How do you determine if two sentences are similar ?
Explain how you can approach sentiment analysis from twitter ?
I want to find topics in a set of documents, what models will I use ?
What is perplexity ?
How to measure the effectiveness of a model for spam filtering? Note that this is a highly imbalanced problem.
What are popular python libraries you will use for NLP ?
What is stemming and lemmatization ?
What are the difference ways in which you can represent a document ?
Basic Math Questions for Data Science
What is a valid probability distribution ?
Explain Bayes rule ?
What is Maximum likelihood estimate ?
What is joint probability and what is conditional probability ?
What are eigen-values and eigen-vectors ? Why do we care about them ?
What is the central limit theorem ?
You are given a function and a data point. How will you find if this point is a maximizer / minimizer / neither maximizer, minimizer ?
What is the difference between MLE and MAP estimates ?
What is the difference between global optima a`nd local optima ?
What is convex function ? Why do we care about convexity ?
What is bias ? How do you know if an estimator is biased ?
What is a Cumulative Distribution Function ?