Machine Learning Archives - Page 2 of 7 - Ace the Data Science Interview!

Covariance and Correlation

Posted on January 6, 2021January 11, 2021 by MLNerds

Often in data science, we want to understand how one variable is related to another. These variables could be features for an ML model, or sometimes we might want to see how important afeature is in determining the target we are trying to predict. Both covariance and correlation can be used to measure the direction…

How to find the Optimal Number of Clusters in K-means? Elbow and Silhouette Methods

Posted on December 21, 2020December 23, 2020 by MLNerds

K-means Clustering Recap Clustering is the process of finding cohesive groups of items in the data. K means clusterin is the most popular clustering algorithm. It is simple to implement and easily available in python and R libraries. Here is a quick recap of how K-means clustering works. Choose a value of K Initialize K…

Detecting and Removing Gender Bias in Word Embeddings

Posted on December 14, 2020December 14, 2020 by MLNerds

What are Word Embeddings? Word embeddings are vector representation of words that can be used as input (features) to other downstream tasks and ML models. Here is an article that explains popular word embeddings in more detail. They are used in many NLP applications such as sentiment analysis, document clustering, question answering, paraphrase detection…

Dartboard Paradox: Probability Density Function vs Probability

Posted on December 3, 2020December 7, 2020 by MLNerds

What is the Dartboard Paradox ? Assume your are throwing a dart at dartboard such that it hits somewhere on the dartboard. The dartboard paradox: The probability of hitting any specific point on the dartboard is zero. However the probability of hitting somewhere on the dartboard is 1. How can this be ? ( How…

Explain Locality Sensitive Hashing for Nearest Neighbour Search ?

Posted on November 11, 2020November 15, 2020 by MLNerds

What is Locality Sensitive Hashing (LSH) ? Locality Sensitive hashing is a technique to enable creating a hash or putting items in buckets such similar items are in the same bucket (same hash) with high probability Dissimilar items are in different buckets – i.e dissimilar items are in the same bucket with low probability. Where…

What is Simpsons Paradox ?

Posted on October 27, 2020October 27, 2020 by MLInterview

Simpsons Paradox occures when trends in aggregates are reversed when examining trends in subgroups. Data often has biases that are might might lead to unexpected trends, but digging deeper and deciphering these biases and looking at appropriate sub-groups leads to drawing the right insights. Why does Simpson’s paradox occur ? Arithmetically, when (a1/A1) < (a2/A2)…

What is Elastic Net Regularization for Regression?

Posted on October 8, 2020October 8, 2020 by MLInterview

Most of us know that ML models often tend to overfit to the training data for various reasons. This could be due to lack of enough training data or the training data not being representative of data we expect to apply the model on. But the result is that we end up building an overly…

What are Isolation Forests? How to use them for Anomaly Detection?

Posted on September 25, 2020September 28, 2020 by MLInterview

All of us know random forests, one of the most popular ML models. They are a supervised learning algorithm, used in a wide variety of applications for classification and regression. Can we use random forests in an unsupervised setting? (where we have no labeled data?) Isolation forests are a variation of random forests that can…

What is One-Class SVM ? How to use it for anomaly detection?

Posted on September 9, 2020September 14, 2020 by MLInterview

One-class SVM is a variation of the SVM that can be used in an unsupervised setting for anomaly detection. Let’s say we are analyzing credit card transactions to identify fraud. We are likely to have many normal transactions and very few fraudulent transactions. Also, the next fraud transaction might be completely different from all previous…

Can we use the AUC Metric for a SVM Classifier ?

Posted on August 17, 2020August 17, 2020 by MLInterview

What is AUC ? AUC is the area under the ROC curve. It is a popularly used classification metric. Classifiers such as logistic regression and naive bayes predict class probabilities as the outcome instead of the predicting the labels themselves. A new data point is classified as positive if the predicted probability of positive class…

← Newer posts Older posts →