Pearson Correlation | |||
Description | The Pearson correlation coefficient is also known as Pearson's or simply the correlation coefficient. It is a statistical measure used to determine the degree and magnitude of the association or correlation, as well as the direction of the relationship between any chosen two variables. It is very commonly used in linear regression. | ||
Why to use |
| ||
When to use |
| When not to use |
|
Prerequisites |
| ||
Input | Any dataset containing numerical variables. | Output |
|
Statistical Methods Used |
| Limitations |
|
The Pearson correlation coefficient, denoted as "r," is a statistical measure that quantifies the strength and direction of the linear relationship between two variables.
The mathematical expression of "r" is:
r = Σᵢ((xᵢ − mean(x))(yᵢ − mean(y))) (√Σᵢ(xᵢ − mean(x))² √Σᵢ(yᵢ − mean(y))²)⁻¹
Here, 'i' takes on the values 1, 2, …,n. The mean values of selected features x and y are denoted as mean (x) and mean(y). If larger value of x is associated with larger value of y and vice-versa, the r is positive. On the other hand, if the larger x value is associated with smaller y values, then r is negative.
It ranges from -1 to 1, where a positive value indicates a positive linear relationship, a negative value tells a negative linear relationship, and a value of 0 shows no linear relationship.
It is widely used to assess the association between variables in various fields of study and provides a standardized measure for comparison.