Naive Bayes Classifier : Advantages and Disadvantages

How does the Naive Bayes Classifier Work? What are the advantages and Disadvantages of using the Naive Bayes classifier?

Recap: Naive Bayes Classifier

Naive Bayes Classifier is a popular model for classification based on the Bayes Rule.

computing probability of each class based on Bayes rule

Note that the classifier is called Naive – since it makes a simplistic assumption that the features are conditionally independant given the class label. In other words:

Naive Assumption: 

P(datapoint | class) = P(feature_1 | class) * … * P(feature_n | class)

This assumption does not hold in a lot of usecases.

The probabilities used in the naive Bayes classifier are typically computed using the maximum likelihood estimate.

Lets take the example of spam detection.

Naive Bayes classifier formula in the context of spam detection
The probabilities can be computed using a simple ratio. P(spam) is just the ratio of the total number of spam documents to the total number of documents overall. P(bmper | spam) is the number of times the word bumper appeares in the spam documents to the total number of word in the spam documents. Look at the example below for more details on how the probabilities are evaluated.
Naive bayes classifier example with  sample training data for spam detection

Advantages of Using Naive Bayes Classifier

  • Simple to Implement. The conditional probabilities are easy to evaluate.
  • Very fast – no iterations since the probabilities can be directly computed. So this technique is useful where speed of training is important.
  • If the conditional Independence assumption holds, it could give great results.

Disadvantages of Using Naive Bayes Classifier

  • Conditional Independence Assumption does not always hold. In most situations, the feature show some form of dependency.
  • Zero probability problem : When we encounter words in the test data for a particular class that are not present in the training data, we might end up with zero class probabilities. See the example below for more details: P(bumper | Ham) is 0 since bumper does not occuer in any ham (non-spam) documents in the training data.
Naive bayes classifier example to show zero frequency problem for spam detection

The zero probability problem can be Remedied through smoothing where we add a small smoothing factor to the numerator and denominator of every probability to avoid zero even for new words. See the example below to understand smoothing.

smoothing for naive Bayes classifier for spam detection

  • Bad binning of continuous variables with Multinomial naive bayes: Gaussian Naive Bayes
  • Not great for imbalanced data:  Complement Naive Bayes

Take a look at the wikipedia article on naive Bayes classifier to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *