Isolation Forest

Isolation Forest

Isolation Forest

Description

Isolation Forest is an unsupervised algorithm used for anomaly detection that isolates the anomalies rather than building a model of normal instances.

Why to use

Isolation forest detects anomalies faster and requires less memory space compared to other anomaly detection algorithms.

When to use

To handle high-dimensional and large-sized input data

When not to use

Inappropriate feature extraction & defining normal and abnormal behaviour in the data, variations in the abnormal data increase the dataset's complexity. In such cases, isolation forest cannot be used.

Prerequisites

Data should contain only numeric/Continuous datatype variables. Data should not contain any missing values.

Input

Any classification dataset with numeric input variables


Output

  • Anomaly scores
  • Anomaly labels
  • Score samples cluster plot containing inliers
  • Outliers (anomalies).
  • The input dataset is classified into two categories as 1 and -1.
    • -1 implies-Outliers.
    • 1 implies normal data points.

Statistical Methods used

It works on the principle of the decision tree algorithm. It works on the principle of decision tree algorithms, but that cannot be defined in the statistical methods used section as a decision tree is an ML algorithm.

Limitations

It fails to detect local anomaly points, which affects the accuracy of the algorithm.

    • Related Articles

    • Random Forest

      Random Forest Description Random Forest is a Supervised Machine Learning algorithm. It works on the Bagging (Bootstrap Aggregation) principle of the Ensemble technique. Thus, it uses multiple models instead of a single model to make predictions. It ...
    • Random Forest Regression

      Random Forest Regression Description Random Forest Regression is an ensemble learning method that combines multiple decision trees to create a powerful predictive model for continuous target variables. It utilizes random feature selection to improve ...
    • Rubiscape Autumn '21

      New Features Platform & Studio Data Dictionary - Ability to create, edit, delete Data Dictionary JSON Dataset – Ability to create, edit, delete JSON file dataset Algorithms added: Count Vectorization TFIDF Algorithm SMOTE Algorithm – Detection and ...
    • rubiscape Platform Architecture

      The three phases of data analysis are input, insight, and impact. These are explained below. Input: Inputs are nothing but different sources of data. These include various location data, transactional databases, social media data, mobile application ...
    • Rubiscape Autumn '20

      New Features Platform & Studio Dataset: S3 dataset – Ability to create, edit, delete S3 dataset SAP HANA – Ability to create, edit, delete HANA dataset Algorithms added: Factor Analysis PCA MLP Neural Network Regression Ridge Regression Lasso ...