Pandas is super popular for data science tasks. But code written in pandas can often be slow. This article talks about how one can make pandas code faster. We will walk through a super simple task of adding 1 to every element in the first column of a dataframe and see how different ways of…
Author: MLNerds
Should I Transition to Data Science?
Data Science is popular and lots of folks are considering a career transition to data science. Does it make sense to transition to data science? How can one answer this question? The factors to consider might be different for different people. This video talks about a couple of factors to consider when deciding whether to…
Differences between Pandas and NumPy
Pandas and NumPy are two of the most popular python libraries used for data science applications. What is Numpy? Numpy is a popular library used for scientfic computing. It has support for multidimensional arrays and mathematical functions that can operate on these arrays. NumPy arrays are homogeneously typed – which means they hold elements of…
How to learn Math for Machine Learning
Becoming a data scientist is intrinsically linked to being upto date on statistics and the underlying math along with other practical skills. But how much math do you need? And how do you actually pick up the math? Here is a brief video on learning the math for ML. What Math is required for ML The…
Monty Hall Problem
The Monty Hall problem is a puzzle based on an American reality show ‘Lets Make a Deal’. It is a popular probability riddle that comes up when one is learning probability and statistics, since the first cut solution that comes to mind is often different from what we get by applying basic principles of probability…
Covariance and Correlation
Often in data science, we want to understand how one variable is related to another. These variables could be features for an ML model, or sometimes we might want to see how important afeature is in determining the target we are trying to predict. Both covariance and correlation can be used to measure the direction…
Decoding the Data Scientist Hiring Gap
The need for AI/ML is growing and more and more jobs are being created as data awareness is increasing and more data is being collected. However, hiring data scientists has not been an easy task – most of these roles are not yet filled. On the other hand, data science is a very popular discipline….
How to find the Optimal Number of Clusters in K-means? Elbow and Silhouette Methods
K-means Clustering Recap Clustering is the process of finding cohesive groups of items in the data. K means clusterin is the most popular clustering algorithm. It is simple to implement and easily available in python and R libraries. Here is a quick recap of how K-means clustering works. Choose a value of K Initialize K…
Detecting and Removing Gender Bias in Word Embeddings
What are Word Embeddings? Word embeddings are vector representation of words that can be used as input (features) to other downstream tasks and ML models. Here is an article that explains popular word embeddings in more detail. They are used in many NLP applications such as sentiment analysis, document clustering, question answering, paraphrase detection…
Dartboard Paradox: Probability Density Function vs Probability
What is the Dartboard Paradox ? Assume your are throwing a dart at dartboard such that it hits somewhere on the dartboard. The dartboard paradox: The probability of hitting any specific point on the dartboard is zero. However the probability of hitting somewhere on the dartboard is 1. How can this be ? ( How…