MLNerds

Explain Locality Sensitive Hashing for Nearest Neighbour Search ?

Posted on November 11, 2020November 15, 2020 by MLNerds

What is Locality Sensitive Hashing (LSH) ? Locality Sensitive hashing is a technique to enable creating a hash or putting items in buckets such similar items are in the same bucket (same hash) with high probability Dissimilar items are in different buckets – i.e dissimilar items are in the same bucket with low probability. Where…

The Machine Learning Product Lifecycle – Challenges building ML products

Posted on November 9, 2020November 11, 2020 by MLNerds

Unlike the popular notion that being involved with ML products involves crunching math and stats, there are a lot of steps involved in productionizing ML and creating real products. Here is a brief video that explores Machine Learning Product development lifecycle and also talks about how it is different from the traditional product development lifecycle….

How to answer “Explain Linear Regression?”

Posted on August 7, 2020 by MLNerds

I interviewed 100+ folks in the last few months helping with interview prep. Many were stuck on answering a basic ML concept question. Most have an intuition and understand the basic concept, probably have watched a detailed video on it in a data science course. But when it comes to articulating the concept concisely in…

Semantic Textual Similarity: Automatic Question Answering from FAQs

Posted on August 7, 2020February 15, 2021 by MLNerds

Semantic Textual Similarity is the task of determining how close two pieces of text are in meaning. It has many applications such as question answering, information retrieval, recommendation systems and so on. Here is a 1 hour NLP code-along beginners video tutorial on semantic textual similarity. The session covers the task of Automatic Question Answering from…

What is overfitting and underfitting ? Give examples. How do you overcome them?

Posted on March 18, 2019 by MLNerds

ANSWER here

What is stratified sampling and why is it important ?

Posted on February 21, 2019July 1, 2019 by MLNerds

Stratified sampling is a sampling method where population is divided into homogenous subgroups called strata and the right number of instances are sampled from each stratum. For further explanation visit here. This sampling is important to ensure that sampled dataset is representative of the entire population. To realise this point, consider an example of predicting…

Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Posted on February 17, 2019May 2, 2019 by MLNerds

Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix to get a K dimensional approximation. Other Matrix factorisation techniques…

What is negative sampling when training the skip-gram model ?

Posted on February 17, 2019August 5, 2021 by MLNerds

Recap: Skip-Gram model is a popular algorithm to train word embeddings such as word2vec. It tries to represent each word in a large text as a lower dimensional vector in a space of K dimensions such that similar words are closer to each other. This is achieved by training a feed-forward network where we try…

What is PMI ?