Local Outlier Factor

Local Outlier Factor


Local Outlier Factor

Description

The Local Outlier Factor (LOF) algorithm is an unsupervised machine learning algorithm based on the concept of local density. It compares the density of data points in the distribution to the density of the neighboring data points in the same distribution. The data points that have a significantly lower density than their neighbors are considered outliers.

Why to use

For anomaly detection

When to use

  • When you want to compare the local density of a data point to the local densities of its neighbors.
  • When you want to identify regions of similar density.
  • When you want to identify data points that have a significantly lower density than their neighbors.

When not to use

On numerical, textual, and interval type data.

Prerequisites

The dependent variable should be of categorical type.

Input

Any dataset that contains categorical data.

 

Output

Cluster Plot with the outliers highlighted in the plot.

Statistical Methods used

  • Minkowski distance
  • Cosine distance
  • Euclidean distance
  • Manhattan distance

Limitations

It cannot be used on data other than categorical data.

Local Outlier Factor detects the outliers or deviation of data points in a distribution with respect to the density of its neighbors. It identifies local outliers in a dataset that are not outliers in another region of the dataset. 

For example, consider a very dense cluster of data points in a dataset. One of the data points is at a small distance from the dense cluster. This data point is considered an outlier. In the same dataset, a data point in a sparse cluster might appear to be an outlier but is detected to be at a similar distance from each of its neighbors.

A normal data point has a LOF between 1 and 1.5, while an outlier has a much higher LOF. If the LOF of a data point is 10, it means that the average density of its neighbors is ten times higher than the local density of the data point.

The Local Outlier Factor method is used in detecting outliers in geographic data, video streams, or network intrusion detection.


    • Related Articles

    • Factor Analysis

      Factor Analysis Description Factor Analysis is also known as exploratory Factor Analysis for data reduction. It is a technique of examining interdependent variables without distinguishing between dependent and independent variables. Factor Analysis ...
    • Outlier Detection

      Outlier Detection Description Outlier Detection reveals the extreme values that deviate from the rest of the data in a real-world dataset. Why to use Numerical Analysis – Data Preparation When to use When there are certain values in the data which ...
    • Outlier Detection

      Outlier Detection Description Outlier Detection reveals the extreme values that deviate from the rest of the data in a real-world dataset. Why to use Numerical Analysis – Data Preparation When to use When there are certain values in the data which ...
    • Rubiscape Autumn '21

      New Features Platform & Studio Data Dictionary - Ability to create, edit, delete Data Dictionary JSON Dataset – Ability to create, edit, delete JSON file dataset Algorithms added: Count Vectorization TFIDF Algorithm SMOTE Algorithm – Detection and ...
    • Data Preparation

      What is Data Preparation Data preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and further analyzed. In data preparation, data is reformatted, corrected, and combined so that it gets ...