## What is the difference between supervised and unsupervised learning ?

In Supervised Learning the algorithm learns from labeled training data. In other words, each data point is tagged with the answer or the label the algorithm should come up with. Using such labeled data, the goal is to predict labels for new data points. The two common forms of supervised learning are classification and regression….

## When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?

Deep learning algorithms are capable of learning arbitrarily complex non-linear functions by using a deep enough and a wide enough network with the appropriate non-linear activation function. Traditional ML algorithms often require feature engineering of finding the subset of meaningful features to use. Deep learning algorithms often avoid the need for the feature engineering step….

## Why do you typically see overflow and underflow when implementing an ML algorithms ?

A common pre-processing step is to normalize/rescale inputs so that they are not too high or low. However, even on normalized inputs, overflows and underflows can occur: Underflow: Joint probability distribution often involves multiplying small individual probabilities. Many probabilistic algorithms involve multiplying probabilities of individual data points that leads to underflow. Example : Suppose you…

## Is the run-time of an ML algorithm important? How do I evaluate whether the run-time is OK?

Runtime considerations are often important for many applications.  Typically you should look at training time and prediction time for an ML algorithm. Some common questions to ask include: Training: Do you want to train the algorithm in a batch mode? How often do you need to train? If you need to retrain your algorithm every…

## How do you handle missing data in an ML algorithm ?

There is no fixed rule to deal with missing data but one could use any of the heuristics mentioned below.  The most common way of dealing with missing data is to remove all rows with missing data if there are not too many rows with missing data. If more than 50-60% of rows of a…

## With the maximum likelihood estimate are we guaranteed to find a global Optima ?

Maximum likelihood estimate finds that value of parameters that maximize the likelihood. If the likelihood is strictly concave(or negative of likelihood is strictly convex), we are guaranteed to find a unique optimum. This is usually not the case and we end up finding a local optima. Hence, the Maximum likelihood estimate usually finds a local…

## What are evaluation metrics for multi-class classification problem (like positive/negative/neutral sentiment analysis)

For multiclass classification(MCC) problems, metrics  can be derived from the confusion matrix. Let $tp_i,tn_i,fp_i,fn_i$ denote the true positives, true negatives, false positives, false negatives respectively. MCC problems, usually macro and micro metrics are computed: → Micro metrics (with subscript $\mu$ in table below) are computed by summing up individual tp, tn, fp and fn to…

## What is page rank algorithm ?

Quote from wikipedia: A PageRank results from a mathematical algorithm based on the webgraph, created by all World Wide Web pages as nodes and hyperlinks as edges, taking into consideration authority hubs such as cnn.com or usa.gov. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a…

## You want to find food related topics in twitter – how do you go about it ?

One can use any of the topic models above to get topics. However, to direct the topics to contain food related information, specialized topic modeling algorithms are available. However, one simple way to direct the topics to food related things is : Filter tweets by a limited set of food related keywords (food, meal, dinner,…