Often in data science, we want to understand how one variable is related to another. These variables could be features for an ML model, or sometimes we might want to see how important afeature is in determining the target we are trying to predict. Both covariance and correlation can be used to measure the direction…
Author: MLNerds
Decoding the Data Scientist Hiring Gap
The need for AI/ML is growing and more and more jobs are being created as data awareness is increasing and more data is being collected. However, hiring data scientists has not been an easy task – most of these roles are not yet filled. On the other hand, data science is a very popular discipline….
How to find the Optimal Number of Clusters in K-means? Elbow and Silhouette Methods
K-means Clustering Recap Clustering is the process of finding cohesive groups of items in the data. K means clusterin is the most popular clustering algorithm. It is simple to implement and easily available in python and R libraries. Here is a quick recap of how K-means clustering works. Choose a value of K Initialize K…
Detecting and Removing Gender Bias in Word Embeddings
What are Word Embeddings? Word embeddings are vector representation of words that can be used as input (features) to other downstream tasks and ML models. Here is an article that explains popular word embeddings in more detail. They are used in many NLP applications such as sentiment analysis, document clustering, question answering, paraphrase detection…
Dartboard Paradox: Probability Density Function vs Probability
What is the Dartboard Paradox ? Assume your are throwing a dart at dartboard such that it hits somewhere on the dartboard. The dartboard paradox: The probability of hitting any specific point on the dartboard is zero. However the probability of hitting somewhere on the dartboard is 1. How can this be ? ( How…
Explain Locality Sensitive Hashing for Nearest Neighbour Search ?
What is Locality Sensitive Hashing (LSH) ? Locality Sensitive hashing is a technique to enable creating a hash or putting items in buckets such similar items are in the same bucket (same hash) with high probability Dissimilar items are in different buckets – i.e dissimilar items are in the same bucket with low probability. Where…
The Machine Learning Product Lifecycle – Challenges building ML products
Unlike the popular notion that being involved with ML products involves crunching math and stats, there are a lot of steps involved in productionizing ML and creating real products. Here is a brief video that explores Machine Learning Product development lifecycle and also talks about how it is different from the traditional product development lifecycle….
Top 50 Machine Learning Interview Questions
Whether you are kickstarting your interview preparation, or wrapping up your preparation and are looking for final touches, here are over 50 must see questions to prepare for a data science interview. We have put them in five categories for convenience. (Note: There are sevaral more questions along with answers in the main menu “Interview…
How to answer “Explain Linear Regression?”
I interviewed 100+ folks in the last few months helping with interview prep. Many were stuck on answering a basic ML concept question. Most have an intuition and understand the basic concept, probably have watched a detailed video on it in a data science course. But when it comes to articulating the concept concisely in…
Semantic Textual Similarity: Automatic Question Answering from FAQs
Semantic Textual Similarity is the task of determining how close two pieces of text are in meaning. It has many applications such as question answering, information retrieval, recommendation systems and so on. Here is a 1 hour NLP code-along beginners video tutorial on semantic textual similarity. The session covers the task of Automatic Question Answering from…