Pearson Correlation

Pearson Correlation
Description	The Pearson correlation coefficient is also known as Pearson's or simply the correlation coefficient. It is a statistical measure used to determine the degree and magnitude of the association or correlation, as well as the direction of the relationship between any chosen two variables. It is very commonly used in linear regression.
Why to use	Measurement of association Direction and magnitude Standardization Statistical significance
When to use	Continuous variables Scale-invariant analysis Linear model assumption Normal distribution assumptions Data visualization	When not to use	Non-linear relationships Non-continuous variables Outliers Non-normal distributions Limited range of data Confounding factors
Prerequisites	Linearity Continuous variables Bivariate normality Independence Minimum of two variables is required
Input	Any dataset containing numerical variables.	Output	Correlation Metrix Correlation Score
Statistical Methods Used	Mean Correlation Coefficient	Limitations	Limited to linear relationships Sensitivity to outliers Depend on the range and distribution of data It does not capture all relationships Influenced by sample size Confounding factors Unlimited to numeric variables

The Pearson correlation coefficient, denoted as "r," is a statistical measure that quantifies the strength and direction of the linear relationship between two variables.
The mathematical expression of "r" is:

r = Σᵢ((xᵢ − mean(x))(yᵢ − mean(y))) (√Σᵢ(xᵢ − mean(x))² √Σᵢ(yᵢ − mean(y))²)⁻¹

Here, 'i' takes on the values 1, 2, …,n. The mean values of selected features x and y are denoted as mean (x) and mean(y). If larger value of x is associated with larger value of y and vice-versa, the r is positive. On the other hand, if the larger x value is associated with smaller y values, then r is negative.
It ranges from -1 to 1, where a positive value indicates a positive linear relationship, a negative value tells a negative linear relationship, and a value of 0 shows no linear relationship.
It is widely used to assess the association between variables in various fields of study and provides a standardized measure for comparison.

Related Articles
Word Correlation
Word Correlation Description Word correlation refers to the association or relationship between two words in a text. It determines whether and how strongly pairs of quantitative and continuous variables (in this case, words) are related to each ...
Rubiscape Spring '20
New Features Platform & Studio 'AutoML' is available for user. User can use this feature through wizard and through workbook. Overall stabilization of the platform Rubisight Introducing a new module Rubisight on the Rubiscape platform. Rubisight is a ...
Factor Analysis
Factor Analysis Description Factor Analysis is also known as exploratory Factor Analysis for data reduction. It is a technique of examining interdependent variables without distinguishing between dependent and independent variables. Factor Analysis ...
FactorAnalysis
Factor Analysis Description Factor Analysis is also known as exploratory Factor Analysis for data reduction. It is a technique of examining interdependent variables without distinguishing between dependent and independent variables. Factor Analysis ...
Rubiscape Winter '19
New Features Platform & Studio New dataset creation feature for Twitter, PostgresSQL, SQL, MySQL, Oracle, Excel, CSV, Google News. Create dataset from a local TXT file using delimiter option. Supported delimiters are Semicolon, Pipe, Comma, Tab, ...

Pearson Correlation

Pearson Correlation

Related Articles

Word Correlation

Rubiscape Spring '20

Factor Analysis

FactorAnalysis

Rubiscape Winter '19