The Synthetic Minority Oversampling Technique or SMOTE is a technique for balancing the classification datasets with an acute class imbalance. It is a data augmentation technique in which synthetic samples are generated for the minority class.
SMOTE | |||
Description | SMOTE (Synthetic Minority Oversampling Technique) is an oversampling technique that generates synthetic samples for the minority class in an imbalanced classification dataset. | ||
Why to use |
| ||
When to use | To address imbalanced datasets by oversampling the minority class for developing predictive models on the classification datasets. | When not to use | On text data. |
Prerequisites |
| ||
Input | Any dataset that contains Categorical and Numerical data. | Output | The count of the synthetic samples generated for the minority classes after oversampling. |
Statistical Methods used | k nearest neighbor | Limitations | Since synthetic samples are created without considering the majority class, this technique can discard potentially useful data. |
Class imbalance arises when a classification dataset has an unequal distribution of class. In other words, in a class imbalance, the number of data points in the majority class is more significant than in the minority class.
Most predictions of machine learning techniques correspond to the majority class and ignore the minority class as noise. This can result in bias in the predictive model.
SMOTE balances imbalanced data using the oversampling technique by creating synthetic samples for the minority class. It alleviates the problems faced while developing predictive models on imbalanced classification data. The SMOTE technique increases the features available to each class in the imbalanced dataset and makes the samples more general.