Whether you are kickstarting your interview preparation, or wrapping up your preparation and are looking for final touches, here are over 50 must see questions to prepare for a data science interview. We have put them in five categories for convenience. (Note: There are sevaral more questions along with answers in the main menu “Interview Questions and Answers” )

**Basic Data Science Questions**

How do you evaluate a Machine Learning algorithm ?

Why do you need training set, test set and validation set ?

What is bias variance trade-off in Machine Learning ?

What is the difference between supervised and unsupervised learning ?

When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?

What are popular clustering algorithms? How do you fix the number of clusters in a clustering algorithm ?

Why do you need dimensionality reduction ? What are some ways to do this ?

What is regularization ? What types of regularizers do you know ?

What are the various steps in the typical Machine Learning pipeline ?

What data cleanup and normalization is required in a typical Machine Learning pipeline ?

What is overfitting and underfitting ? Give examples. How do you overcome them?

### Basic Data Cleanup/wrangling Questions

How do you deal with missing data ?

How do you detect outliers in data ? How do you deal with them ?

What pre-processing can you do when you have an imbalanced dataset ?

When and how do you do feature scaling ?

When do you need to normalize your data to have zero mean and unit variance ?

What do you do when you have very little training data ?

You realize you have duplicates in your data – what do you do ?

You have 10,000 features. How do you figure out if you need all of them.

You have a column that contains colors such as “red”, “blue”,… how do you handle this column ?

You have one file with person ID, eye color, ethnicity, height, weight in one file and person ID, salary, family size in another file. How do you make a combined file with these in pandas ?

### Basic Deep Learning Questions

What is machine learning and where does deep learning fit in ?

What are the different loss functions you can use in Deep Learning ? How do you pick one ?

What is drop out ?

What are the different forms of regularizers used in deep learning ?

What are the learning algorithms you are aware of ?

How do you typically initialize weights in deep neural network ?

How do you fix the number of layers & hidden units in a deep neural network ?

What are the differences between keras, tensorflow and pytorch ?

What is the purpose of an LSTM ? Why do you need a bi-directional LSTM ?

What is the difference between a GRU and a bi-directional LSTM ?

What is an attention mechanism ? What are some examples where it is used ?

### Basic NLP Questions

What are stop words ? How do we remove them ?

What are various ways of finding word embeddings ?

Explain the skip-gram model and word2vec embeddings?

How do you determine if two sentences are similar ?

Explain how you can approach sentiment analysis from twitter ?

I want to find topics in a set of documents, what models will I use ?

What is perplexity ?

How to measure the effectiveness of a model for spam filtering? Note that this is a highly imbalanced problem.

What are popular python libraries you will use for NLP ?

What is stemming and lemmatization ?

What are the difference ways in which you can represent a document ?

### Basic Math Questions for Data Science

What is a valid probability distribution ?

Explain Bayes rule ?

What is Maximum likelihood estimate ?

What is joint probability and what is conditional probability ?

What are eigen-values and eigen-vectors ? Why do we care about them ?

What is the central limit theorem ?

You are given a function and a data point. How will you find if this point is a maximizer / minimizer / neither maximizer, minimizer ?

What is the difference between MLE and MAP estimates ?

What is the difference between global optima a`nd local optima ?

What is convex function ? Why do we care about convexity ?

What is bias ? How do you know if an estimator is biased ?

What is a Cumulative Distribution Function ?