Random Forest |
Description | - Random Forest is a Supervised Machine Learning algorithm. It works on the Bagging (Bootstrap Aggregation) principle of the Ensemble technique. Thus, it uses multiple models instead of a single model to make predictions.
- It is extensively used to solve Classification and Regression problems.
- In the case of Classification, Random Forest builds multiple decision trees, trains them using the Bagging principle, and generates an output based on a majority vote.
|
Why to use | To predict a class label based on input data. In other words, to identify data points and separate them into categories. |
When to use | When you have a numerical data | When not to use | When you have textual data or data without categorical variables Note: You can add categorical variables by using the label encoding technique. |
Prerequisites | - The dataset should have at least one categorical variable.
|
Input | Numerical data containing at least one categorical variable | Output | Labelled or classified data |
Statistical Methods used | - Accuracy
- Sensitivity
- Specificity
- F-score
- ROC chart
- Lift Curve
- Confusion Matrix
| Limitations | - It is slow working and difficult to interpret compared to a single decision tree.
- Its accuracy of prediction is low for complex classification problems.
|