Pearson Correlation

Pearson Correlation

Pearson Correlation

Description

The Pearson correlation coefficient is also known as Pearson's or simply the correlation coefficient. It is a statistical measure used to determine the degree and magnitude of the association or correlation, as well as the direction of the relationship between any chosen two variables. It is very commonly used in linear regression.

Why to use

  • Measurement of association
  • Direction and magnitude
  • Standardization
  • Statistical significance

When to use

  • Continuous variables
  • Scale-invariant analysis
  • Linear model assumption
  • Normal distribution assumptions
  • Data visualization

 When not to use

  • Non-linear relationships
  • Non-continuous variables
  • Outliers
  • Non-normal distributions
  • Limited range of data
  • Confounding factors

Prerequisites

  • Linearity
  • Continuous variables
  • Bivariate normality
  • Independence
  • Minimum of two variables is required

Input

Any dataset containing numerical variables.

Output

  • Correlation Metrix
  • Correlation Score

Statistical Methods Used

  • Mean
  • Correlation Coefficient

Limitations

  • Limited to linear relationships
  • Sensitivity to outliers
  • Depend on the range and distribution of data
  • It does not capture all relationships
  • Influenced by sample size
  • Confounding factors
  • Unlimited to numeric variables

The Pearson correlation coefficient, denoted as "r," is a statistical measure that quantifies the strength and direction of the linear relationship between two variables.
The mathematical expression of "r" is:

r = Σᵢ((xᵢ − mean(x))(yᵢ − mean(y))) (√Σᵢ(xᵢ − mean(x))² √Σᵢ(yᵢ − mean(y))²)⁻¹

Here, 'i' takes on the values 1, 2, …,n. The mean values of selected features x and y are denoted as mean (x) and mean(y). If larger value of x is associated with larger value of y and vice-versa, the r is positive. On the other hand, if the larger x value is associated with smaller y values, then r is negative.
It ranges from -1 to 1, where a positive value indicates a positive linear relationship, a negative value tells a negative linear relationship, and a value of 0 shows no linear relationship.
It is widely used to assess the association between variables in various fields of study and provides a standardized measure for comparison.

    • Related Articles

    • Word Correlation

      Word Correlation Description Word correlation refers to the association or relationship between two words in a text. It determines whether and how strongly pairs of quantitative and continuous variables (in this case, words) are related to each ...
    • Rubiscape Spring '20

      New Features Platform & Studio 'AutoML' is available for user. User can use this feature through wizard and through workbook. Overall stabilization of the platform Rubisight Introducing a new module Rubisight on the Rubiscape platform. Rubisight is a ...
    • Factor Analysis

      Factor Analysis Description Factor Analysis is also known as exploratory Factor Analysis for data reduction. It is a technique of examining interdependent variables without distinguishing between dependent and independent variables. Factor Analysis ...
    • FactorAnalysis

      Factor Analysis Description Factor Analysis is also known as exploratory Factor Analysis for data reduction. It is a technique of examining interdependent variables without distinguishing between dependent and independent variables. Factor Analysis ...
    • Rubiscape Winter '19

      New Features Platform & Studio New dataset creation feature for Twitter, PostgresSQL, SQL, MySQL, Oracle, Excel, CSV, Google News. Create dataset from a local TXT file using delimiter option. Supported delimiters are Semicolon, Pipe, Comma, Tab, ...