Chi-square test

Chi-square test

Description

A Chi-square test is a data analysis process based on random samples of categorical variables. It is mainly used to accept or reject the null hypothesis. The null hypothesis mainly predicts the relation between two independent categorical variables.

Why to use

To find out the difference between observed values and expected values. The cause of this difference is due to the relation between them or by chance.

When to use

  • When you want to analyze the categorical data of the random sample.
  • To accept or reject the null hypothesis by comparing p value with the alpha value.

When not to use

When both variables are categorical like Blood Group, location, and Customer Satisfaction.

Prerequisites

You need to select the input column having only two categories to compute Chi Square test results.

Input

Set of independent variables

Output

Relation between the two variables

Statistical Method Used

Chi Square Statistics

Limitations

  • It does not give the strength of the relationship and its significance.
  • This test is sensitive to sample size.

    • Related Articles

    • Chi Square Test for Independence

      Chi Square Test for Independence Description Chi Square Test for Independence determines whether two categorical variables are related or independent. Why to use To test the independence or association between categorical variables. When to use When ...
    • Chi Square Goodness of Fit Test

      Chi Square Goodness of Fit Test Description Chi Square Goodness of Fit Test determines whether a categorical variable is likely to be derived from a specified distribution. This test is the same as Pearson’s Chi Square test. Why to use To check ...
    • Train Test Split

      Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
    • Train Test Split

      Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
    • Shapiro-Wilk Test

      Shapiro-Wilk Test Description The Shapiro-Wilk test is a normality test in probability determination statistics. It is used to determine whether a simple random sample of a variable’s values has been derived from a normal distribution. Why to use For ...