Random Forest is located under Machine Learning ( ) in Classification, in the left task pane. Use the drag-and-drop method (or double-click on the node) to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis.
Refer to Properties of Random Forest.
The available properties of the Random Forest are as shown in the figure below.
The table given below describes the different fields present on the properties pane of Random Forest.
Field | Description | Remark |
---|---|---|
Run | It allows you to run the node | - |
Explore | It allows you to explore the successfully executed node. | - |
Vertical Ellipses | The available options are
| - |
Task Name | It is the name of the task selected on the workbook canvas. | You can click the text field to edit or modify the task's name. |
Dependent Variable | It allows you to select the dependent variable. |
|
Independent variables | It allows you to select the experimental or predictor variable(s). |
|
Advanced | ||
Number of Estimators | It allows you to select the number of base estimators in the ensemble. |
|
Criterion | It allows you to select the Decision-making criterion to be used. |
|
Maximum Features | It allows you to select the maximum number of features to be considered for the best split. |
|
Random State | It allows you to select a random combination of train and test for the classifier. |
|
Maximum Depth | It allows you to set the length of the decision tree. |
|
Feature Selection Percentage | It is used to decide the feature importance of the selected variable. |
|
Dimensionality Reduction | It allows you to select the dimensionality reduction technique. |
|
Add Result as a Variable | It allows you to select any of the result parameters as the variable. |
|
Node Configuration | It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow. | For more details, refer to Worker Node Configuration. |
Hyperparameter Optimization | It allows you to select parameters for Hyperparameter Optimization. | For more details, refer to Hyperparameter Optimization. |
Consider an HR dataset with over 1400 rows and 30 columns. There are multiple features like Attrition, BusinessTravel, DailyRate, PercentSalaryHike, PerformanceRating, and so on in the dataset. The dataset can study the impact of multiple factors on employee attrition.
A snippet of input data is shown in the figure given below.
Property | Value |
---|---|
Dependent Variable | Attrition |
Independent Variables | Age, DailyRate, Education, JobSatisfaction, PercentSalaryHike, StockOptionLevel, WorkLifeBalance |
No. of Estimators | 100 |
Criterion | gini |
Maximum Features | auto |
Random State | 3 |
Maximum Depth | 10 |
Feature Selection Percentage | 60 |
Dimensionality Reduction | None |
Add result as a variable | Accuracy, Sensitivity, Specificity, FScore |
Notes: |
|
Since we select Accuracy, Sensitivity, Specificity, and FScore as the performance metrics, the following variables are created corresponding to the two Events of Interest, "Yes" and "No.". The value of accuracy remains same for both the events. Thus, you have eight 7 new variables created.
For example, Random_Forest_Accuracy_No, and Random_Forest_Accuracy_Yes are the variables created corresponding to the Accuracy metric for the events "Yes" and "No."
Note: | As you can see, the Default and Current values for each variable are the same, and they are also the values for the performance metrics displayed on the Result page. |
The Result Page for the Event of Interest "No" is shown below.
The Result page for the Event of Interest "Yes" is shown below.
The Result page displays
The Data page displays