Extreme Gradient Boost Classification

Extreme Gradient Boost Classification

Extreme Gradient Boost Classification

Description

Extreme Gradient Boost (XGBoost) is a Decision Tree-based ensemble algorithm. XGBoost uses a gradient boosting framework. It approaches the process of sequential tree building using parallelized implementation.

Why to use

Regardless of the dataset and the type of prediction task, i.e., Classification or Regression in hand XGBoost Algorithm performs best and is robust to overfitting.

When to use

To solve the prediction problems for Regression and Classification.

When not to use

Large and unstructured dataset.

Prerequisites

  • If the input dataset has categorical attributes, you need to use Label Encoder.
  • It can only be used for the structured dataset.

Input

Any dataset that contains categorical, numerical, and continuous attributes.

Output

Classification Analysis characteristics - Key Performance Index, Confusion Matrix, ROC Chart, Lift Chart, and Classification Statistics.

Statistical Methods used

  • Accuracy
  • Sensitivity
  • Specificity
  • F Score
  • Confusion Matrix
  • ROC Chart
  • Lift Chart
  • Classification Statistics

Limitations

It is efficient for only small to medium size-structured or tabular data.


XGBoost algorithm falls under the category of supervised learning. It can be used to solve both regression and classification problems.

XGBoost is a Decision Tree-based algorithm. A decision tree is used in classification when the predicted outcome is the class to which the data belongs. A decision tree builds a classification model in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The more the depth of the tree, the more accurate the prediction is. For more information about the Decision Tree algorithm refer to, Decision Tree.

XGBoost uses the ensemble learning method. In this method, data is divided into subsets and passed through a machine learning model to identify wrongly classified data points. Using this outcome, a new model is built to further identify the wrongly classified data points. Depending on the dataset size and desired level of accuracy, the process continues for a fixed number of iterations. It reduces the number of wrongly classified data points and thus increasing accuracy. The resultant output is obtained by aggregating outcomes of multiple machine learning models.


    • Related Articles

    • Extreme Gradient Boost Regression (XGBoost)

      Extreme Gradient Boost Regression (XGBoost) Description Extreme Gradient Boost (XGBoost) Regression is a Decision tree-based ensemble algorithm that uses a gradient boosting framework. Why to use Predictive Modeling When to use When high execution ...
    • Gradient Boosting in Classification

      Gradient Boosting in Classification Description Gradient boosting is a machine learning algorithm. It is a learning method that combines multiple predictive models like decision tree to create a strong predictive model. Why use High Predictive ...
    • Classification

      Data classification is the process of tagging and organizing data according to relevant categories. This makes the data secure and searchable. This makes the data easy to locate and retrieve when needed. Data classification can be content-based, ...
    • Classification

      Classification is the process of predicting the class of given data points. Classes are referred to as targets/ labels or categories. Classification predictive modeling is the task of approximating a mapping function (f) from input variables (X) to ...
    • Textual Classification

      Classification is the process of predicting the class of given data points. Classes are referred to as targets/ labels or categories. Classification belongs to the category of supervised learning. There are two types of learning techniques in ...