Extreme Gradient Boost Classification | ||||
Description | Extreme Gradient Boost (XGBoost) is a Decision Tree-based ensemble algorithm. XGBoost uses a gradient boosting framework. It approaches the process of sequential tree building using parallelized implementation. | |||
Why to use | Regardless of the dataset and the type of prediction task, i.e., Classification or Regression in hand XGBoost Algorithm performs best and is robust to overfitting. | |||
When to use | To solve the prediction problems for Regression and Classification. | When not to use | Large and unstructured dataset. | |
Prerequisites |
| |||
Input | Any dataset that contains categorical, numerical, and continuous attributes. | Output | Classification Analysis characteristics - Key Performance Index, Confusion Matrix, ROC Chart, Lift Chart, and Classification Statistics. | |
Statistical Methods used |
| Limitations | It is efficient for only small to medium size-structured or tabular data. |
XGBoost algorithm falls under the category of supervised learning. It can be used to solve both regression and classification problems.
XGBoost is a Decision Tree-based algorithm. A decision tree is used in classification when the predicted outcome is the class to which the data belongs. A decision tree builds a classification model in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The more the depth of the tree, the more accurate the prediction is. For more information about the Decision Tree algorithm refer to, Decision Tree.
XGBoost uses the ensemble learning method. In this method, data is divided into subsets and passed through a machine learning model to identify wrongly classified data points. Using this outcome, a new model is built to further identify the wrongly classified data points. Depending on the dataset size and desired level of accuracy, the process continues for a fixed number of iterations. It reduces the number of wrongly classified data points and thus increasing accuracy. The resultant output is obtained by aggregating outcomes of multiple machine learning models.