The available properties of the Chi Square test are shown below.
The table below describes the different properties of the Chi Square Test.
Field | Description | Remark |
Run | It allows you to run the node. | – |
Explore | It allows you to explore the successfully executed node. | – |
Vertical Ellipses | The available options are - Run till node
- Run from node
- Publish as a model
- Publish code
| - |
Task Name | It is the name of the task selected on the workbook canvas. | - You can click the text field to edit or modify the task's name.
- Space between words is not allowed in the Task Name.
|
Response | It allows you to select one categorical variable from Dataset. | - Only categorical columns are displayed in this dropdown.
- Pick up one column from the drop-down.
- Please make sure the selected categorical variable has only two categories.
|
Independent Variable | It allows you to select one category from the dataset. | - All data columns are displayed in this dropdown.
- Pick up one column from the drop-down
|
Level of Significance | It allows you to set the level of significance. | - The default value is 0.05. You are allowed to modify this value.
- The Alpha value must be between 0 to 1. It cannot be 0 and 1.
|
Add result as a variable | It allows you to use the result in the variables | For more details, refer to Adding Result as a Variable. |
Node Configuration | It allows you to select the instance of the AWS server to provide control over the execution of a task in a workbook or workflow. | For more details, refer to Worker Node Configuration. |
Example of Chi Square Test
Consider a company employee's data with department, gender, age, and other personal data. As an HR manager, you want to find out whether the Gender distribution in each department is equal or not.
An input data snippet is displayed below.
We apply Chi-Square to the input data by selecting two independent columns. The chosen values are given below.
Property | Value |
Task Name | Chi_Square_Test |
Response | Gender |
Independent Variable | Department |
Alpha | 0.05 |
The result page consists of the following sections.
Frequency Table
The frequency table displays the Response variable values in the row. Independent variable values in the column.
Observed Frequency and Expected Frequency are calculated for each value. Observed Frequency is the number of occurrences found in the sample. The Expected Frequency is calculated as
Expected Frequency = ((Row Total) * (Column Total)) / Total Number of Observations
Computation Table
The computation Table displays the Test Statistics.
- Independent Variable – in this example department
- Response – Gender
- Observed Frequency (O) – the number of occurrences for the gender, of the department.
- Expected Frequency (E) – calculated as ((Row Total) * (Column Total)) / Total Number of Observations
- Observed Frequency – Expected Frequency (O – E)
- (O – E) ^2
- (O – E) ^2 / E
Hypothesis Interpretation
It displays following
- Null Hypothesis
- Alternative Hypothesis
- Result Value – it consists of Alpha value, p value, Critical value read from Chi-square table, Calculated value for the sample
- Interpretation – Compare the p value with the Alpha value. In this example, the p value is equal to the Alpha value hence the hypothesis is rejected.