Linear Regression

Linear Regression

Regression is predictive modeling. It is a statistical method used in finance, investment, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables).
Regression helps investment and financial managers value assets and understand the relationships between variables, such as commodity prices and the stocks of businesses dealing in those commodities.

Description

Linear regression is a statistical and ML method to establish a linear relationship between the input variables (x) and a single output variable (y). The value of y can be calculated from a linear combination of variables x.

Why to use

To perform the Predictive Modeling for the dependent variable.

When to useWhen you want to predict a value depending upon single or multiple independent variables.When not to useOn textual data.

Prerequisites

There should not be any missing values in the data.
The output variable (y)  must be a continuous data type.

  • Linearity: The relationship between x and y is linear.
  • Homoscedasticity: Residual variance is the same for any value of x.
  • Independence: There is no correlation in the independent variables.
  • Normality: Residuals must be normally distributed.
Input
  • A single numeric Dependent variable
  • A set of Independent variables, a combination of numeric and categorical variables
OutputPredicted value.
Statistical Methods used
  • Regression
  • Mean
  • ANOVA
  • R square
  • Adjusted R square
  • Residual
  • Coefficient
  • Significance level (alpha)
  • Durbin Watson – Auto Correlation Factor
  • Variance Inflation Factor
  • p value
  • W Statistic
Limitations
  • It cannot be used on textual data.
  • It is sensitive to Outliers.
  • It is limited to linear



The equation for linear regression is, y=mx+c

  • y is the dependent variable (output)
  • m is the slope or weight
  • x is the independent variable (input)
  • c is the constant or intercept

The output variable is the dependent variable or scalar response, while the input variables are independent or explanatory. The output variable (y) must be a continuous data type.
There are two types of linear regression, simple and multiple. In simple linear regression, there is only one input variable (x), while in multiple linear regression, there are multiple input variables (x1,x2,x3,x4). As input variables increase, slopes or weights will also increase.
Linear regression aims to determine which predictors are significant in predicting the output variable, their efficiency in predicting the output variable, and how they impact the output variable.

    • Related Articles

    • Poisson Regression

      Poisson Regression Description Poisson Regression is a type of linear regression used to model the countable data. Why to use For regression analysis of count data When to use For numerical variables When not to use For textual variables ...
    • Polynomial Regression

      Polynomial Regression Description Polynomial Regression is a supervised learning method in which the relationship between the independent and dependent variables is modeled as an nth degree polynomial. Why to use Predictive Modeling When to use When ...
    • Ridge Regression

      Ridge Regression Description Predict and analyze data points as output for multiple regression data that suffer from multicollinearity by controlling the magnitude of coefficients to avoid over-fitting. Why to use Predictive Modeling When to use To ...
    • Lasso Regression

      Lasso Regression Description Lasso Regression is used to penalize the regression method to select a subset of variables by imposing a constraint on model parameters. Why to use Predictive Modeling When to use For variables having high ...
    • Regression

      Regression is predictive modeling. It is a statistical method, used in finance, investment, and other disciplines, that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a ...