Local Outlier Factor | |||
Description | The Local Outlier Factor (LOF) algorithm is an unsupervised machine learning algorithm based on the concept of local density. It compares the density of data points in the distribution to the density of the neighboring data points in the same distribution. The data points that have a significantly lower density than their neighbors are considered outliers. | ||
Why to use | For anomaly detection | ||
When to use |
| When not to use | On numerical, textual, and interval type data. |
Prerequisites | The dependent variable should be of categorical type. | ||
Input | Any dataset that contains categorical data.
| Output | Cluster Plot with the outliers highlighted in the plot. |
Statistical Methods used |
| Limitations | It cannot be used on data other than categorical data. |
Local Outlier Factor detects the outliers or deviation of data points in a distribution with respect to the density of its neighbors. It identifies local outliers in a dataset that are not outliers in another region of the dataset.
For example, consider a very dense cluster of data points in a dataset. One of the data points is at a small distance from the dense cluster. This data point is considered an outlier. In the same dataset, a data point in a sparse cluster might appear to be an outlier but is detected to be at a similar distance from each of its neighbors.
A normal data point has a LOF between 1 and 1.5, while an outlier has a much higher LOF. If the LOF of a data point is 10, it means that the average density of its neighbors is ten times higher than the local density of the data point.
The Local Outlier Factor method is used in detecting outliers in geographic data, video streams, or network intrusion detection.