SMOTE

SMOTE

The Synthetic Minority Oversampling Technique or SMOTE is a technique for balancing the classification datasets with an acute class imbalance. It is a data augmentation technique in which synthetic samples are generated for the minority class.

SMOTE

Description

SMOTE (Synthetic Minority Oversampling Technique) is an oversampling technique that generates synthetic samples for the minority class in an imbalanced classification dataset.

Why to use

  • To solve the imbalanced data problem.
  • To overcome the overfitting problem that the random oversampling poses.

When to use

To address imbalanced datasets by oversampling the minority class for developing predictive models on the classification datasets.

When not to use

On text data.

Prerequisites

  • The dependent variable should be of Categorical type.
  • The independent variables should be of Numerical type.

Input

Any dataset that contains Categorical and Numerical data.

Output

The count of the synthetic samples generated for the minority classes after oversampling.

Statistical Methods used

k nearest neighbor

Limitations

Since synthetic samples are created without considering the majority class, this technique can discard potentially useful data.


Class imbalance arises when a classification dataset has an unequal distribution of class. In other words, in a class imbalance, the number of data points in the majority class is more significant than in the minority class.

Most predictions of machine learning techniques correspond to the majority class and ignore the minority class as noise. This can result in bias in the predictive model.

SMOTE balances imbalanced data using the oversampling technique by creating synthetic samples for the minority class. It alleviates the problems faced while developing predictive models on imbalanced classification data. The SMOTE technique increases the features available to each class in the imbalanced dataset and makes the samples more general.

    • Related Articles

    • SMOTE

      SMOTE Description SMOTE (Synthetic Minority Oversampling Technique) is an oversampling technique that generates synthetic samples for the minority class in an imbalanced classification dataset. Why to use To solve the imbalanced data problem. To ...
    • Rubiscape Autumn '21

      New Features Platform & Studio Data Dictionary - Ability to create, edit, delete Data Dictionary JSON Dataset – Ability to create, edit, delete JSON file dataset Algorithms added: Count Vectorization TFIDF Algorithm SMOTE Algorithm – Detection and ...
    • Rubiscape Spring '24

      Published On: 18 June 2024 New Features Rubiscape Workspace Level Export/Import: Workspace export functionality available for tenant admin users. Rubiscape users can import required entities into any existing or new workspace. Rubiscape File Server ...