Missing data is caused either due to issues in data collection or sometimes, the data model could allow for missing data (for instance, the field ‘maximum credit limit on any of your cards’ might not make sense for someone who has no credit cards…). With missing data, typically the ML algorithm implementation might fail with…
Category: Machine Learning
With the maximum likelihood estimate are we guaranteed to find a global Optima ?
Maximum likelihood estimate finds that value of parameters that maximize the likelihood. If the likelihood is strictly concave(or negative of likelihood is strictly convex), we are guaranteed to find a unique optimum. This is usually not the case and we end up finding a local optima. Hence, the Maximum likelihood estimate usually finds a local…
What is the difference between deep learning and machine learning?
Deep learning is a subset of Machine Learning. Machine learning is the ability to build “models” that can learn automatically from data, without programming explicit rules. Machine Learning models typically have the ability to generalize to new data. Deep Learning is a field in machine learning where we build multi-layered artificial neural network models to…
What are evaluation metrics for multi-class classification problem (like positive/negative/neutral sentiment analysis)
For multiclass classification(MCC) problems, metrics can be derived from the confusion matrix. Let $tp_i,tn_i,fp_i,fn_i$ denote the true positives, true negatives, false positives, false negatives respectively. MCC problems, usually macro and micro metrics are computed: → Micro metrics (with subscript $\mu$ in table below) are computed by summing up individual tp, tn, fp and fn to…
What is the Page Rank Algorithm ?
How do search engines find what you want? When we search on the internet, we want to see the most relevant pages. Page rank algorithm is a tool to determine which pages are more authorative on the internet based on their popularity to ensure users see pages that are most likely to be of use…
You want to find food related topics in twitter – how do you go about it ?
One can use any of the topic models above to get topics. However, to direct the topics to contain food related information, specialized topic modeling algorithms are available. However, one simple way to direct the topics to food related things is : Filter tweets by a limited set of food related keywords (food, meal, dinner,…
What is stratified sampling and why is it important ?
Stratified sampling is a sampling method where population is divided into homogenous subgroups called strata and the right number of instances are sampled from each stratum. For further explanation visit here. This sampling is important to ensure that sampled dataset is representative of the entire population. To realise this point, consider an example of predicting…
How do you measure quality of Machine translation ?
BLEU (Bilingual evaluation understudy) score is the most common metric used during machine translation. Typically, it is used to measure a candidate translation against a set of reference translations available as ground truth. BLEU score is based on precision – how many of the words in the candidate sentence are in the reference sentence….
Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?
Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix to get a K dimensional approximation. Other Matrix factorisation techniques…
What is negative sampling when training the skip-gram model ?
Recap: Skip-Gram model is a popular algorithm to train word embeddings such as word2vec. It tries to represent each word in a large text as a lower dimensional vector in a space of K dimensions such that similar words are closer to each other. This is achieved by training a feed-forward network where we try…