Shapiro-Wilk Test

Shapiro-Wilk Test

Shapiro-Wilk Test

Description

The Shapiro-Wilk test is a normality test in probability determination statistics. It is used to determine whether a simple random sample of a variable’s values has been derived from a normal distribution.

Why to use

For normality test

When to use

To find out whether a random sample has been derived from a normal distribution.

When not to use

On data other than numerical data.

Prerequisites

  • The input variable should be of numerical type.
  • Shapiro-Wilk normality test generates a significant result if the sample size is sufficiently large.

Input

Any dataset that contains numerical data.

 

Output

  • W Statistic
  • p-Value
  • alpha (α)

Statistical Methods used

NA

Limitations

  • It can be used only on numerical data.
  • The data is inferred to be normally distributed depending upon the user’s assessment or requirements.
  • For sample size > 5000, the normality test result can be inferred only from the W Statistic value.

The p-value is the probability of attaining observed results of a statistical hypothesis test, assuming that the null hypothesis is true.

The null hypothesis of the Shapiro-Wilk test is – Input data comes from a normal distribution, while the alternative hypothesis is – Input data does not come from a normal distribution.

The Shapiro-Wilk test rejects the null hypothesis of normality when the p-value is less than or equal to 0.05. Failing the normality test allows you to state with 95% confidence that the data does not fit the normal distribution. Passing the normality test enables you to declare that no significant departure from normality was found.

The test generates a W Statistic value which depends on the ordered random sample values and the constants generated by covariances, variances, and means of a normally distributed random sample. If the W Statistic value is small, the null hypothesis is rejected, and it can be concluded that the random sample is not normally distributed.

Shapiro-Wilk normality test generates a significant result if the sample size is sufficiently large.

    • Related Articles

    • One Sample T Test

      One Sample T Test Description A one-sample t-test is a statistical test for determining if the mean of a single sample varies significantly from a hypothesized population mean. Why to use To determine if there is statistical difference between sample ...
    • One Sample Z Test

      One Sample Z Test Description One-sample z-test is a statistical test used to determine if the mean of a single sample is significantly different, from a hypothesized population mean, when the population standard deviation is known. Why to use ...
    • Train Test Split

      Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
    • Train Test Split

      Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
    • One Sample Proportion Test

      One Sample Proportion Test Description A one-sample proportion test is a statistical test used to determine if a single proportion (or percentage) of a population is statistically different from a hypothesized value. Why to use To determine if a ...