Stratified sampling is located under Model Studio > Data Preparation > Sampling > Stratified. Use the drag-and-drop method (or double-click on the node) to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis.
The table below describes the properties of Stratified sampling.
Field | Description | Remark |
Run | It allows you to run the node. | - |
Explore | It allows you to explore the successfully executed node. | - |
Vertical Ellipses | The available options are
| - |
Task Name | It is the name of the task selected on the workbook canvas. | You can click the text field to edit or modify the task name as required. |
Sampling variable | It allows you to select a categorical variable | Only one categorical variable can be selected |
Test percentage | It is the percentage that divides input data into test data. The remaining percentage is train data. | The default value is 0.2 |
We consider the credit card balance dataset, which includes columns such as Age, balance, cards, and so on.
A snippet of input data is shown below.
We select Ethnicity as the sampling variable, and test percentage is 0.2
The result page for stratified sampling is shown below.