Anomaly detection is an important application used across various verticals like healthcare, finance, manufacturing and so on. Local Outlier Factor is a popular density based technique for anomaly detection that does not require prior examples of anomalies.

## What are Advantages of LOF for Anomaly Detection?

- One of the main challenges with Anomaly detection is the lack of training data specifying examples of anomalies to train models for predicting them. LOF is an unsupervised technique that does not require prior examples.
- Several of the simple distance based models for anomaly detection fail when the data has different density in different regions. i.e there are areas where points are close together and areas where points are spread out in the dataset. The LOF method can handle such situations
- Many of the anomaly detection algorithms fail for multimodal distrubutions. LOF can work well for such data as well.

## How does Local Outlier Factor Method Work?

The local outlier factor method works by comparing the density of a point with the relative densities of its neighbours. If a point is relatively less dense than its neighbours, it is a potential outlier. If the ratio of density of neighbours to the density of point is too high, we end up with a high LOF signifying an outlier.

Now **How to compute the density of a point?**

We first define the** K-distance** of a point. This is intuitively the distance of the Kth nearest neighbour to the point.

**K-distance(A)= Dist(A, Kth nearest neighbour)**

The **K neighbourhood **of a point is just the K closest points to a given point.

**N_{k}(A) = { P | Dist(A,P)<= K-distance(A)} **

The **reachability distance **defines how easy it is to reach a point from another point. If a point A is in the K-neighbourhood of B then it is easier to reach A and the reachability distance is lower. For a point A that is not in the K-neighbourhood of B, the reachability distance is higher.

**Reachability Distance_{k}(A,B)= max{ k-distance(B), dist(A,B) }**

The **local reachability density** is defined as the reciprocal of the average reachability distance of points around A to A:

Now that we know the density of a point, the **local reachability factor** is just the average ratio of densities of neighbouring points of A to A.

Note that by taking the ratio of density with neighbouring points and not the actual densities themselves, the LOF method can handle datasets with regions of different densities.

## Python Code for Local Outlier Factor Method

Here is a small toy example to show how LOF can be incorporated in your code. The array X has four points where one of the points 100.2 is a clear outlier. The LOF method can be called to identify outliers.

We see the output of outlier labels that clearly shows the third data point as an outlier while the other points are not. The next line shows the negative of LOF value and we again see that the third point has a significantly lower negative value while others have something close to -1. Points that significantly deviate from 1 are outliers.