FactorAnalysis

FactorAnalysis

Factor Analysis

Description

  • Factor Analysis is also known as exploratory Factor Analysis for data reduction.
  • It is a technique of examining interdependent variables without distinguishing between dependent and independent variables.
  • Factor Analysis extracts maximum common variance from all the variables in the data and puts them into a common score.
  • It groups variables and reduces data. It helps researchers investigate various concepts that are otherwise difficult to measure directly.
  • Factor Analysis identifies the underlying factors in given data. These underlying factors explain the correlation between the set of variables in the given data.
  • It is mostly used as a prelude to multivariate analysis

Why to use

  • For reduction and summarization of data.
  • To remove irrelevant data.
  • To identify the exact number of factors that are required to explain the common themes among a given set of variables
  • To determine the extent to which each variable in the dataset is related to the common factor

When to use

  • As a data preparation method before using unsupervised machine learning models
  • When you want to identify a small set of non-correlated variables to replace the original set of correlated variables to be used in subsequent multivariate analysis.
  • When you want to develop a hypothesis about the relationship between variables
  • When you want to spot trends and themes in your dataset

When not to use

  • When the data is already known with an identified set of dependent and independent variables.
  • When the variables are limited in number and sufficient to describe the trends in data.

Prerequisites

  • The dataset should be sufficiently large with a large number of variables.
  • Outliers should not be present in the data.
  • The variables in the data should not possess a perfect multicollinearity
  • There is a linear relationship between variables
  • Only relevant variables should be included in the analysis, and there should be a true correlation between the variables and factors

Input

Variables that are non-classified as independent and dependent

Output

Factors that group the variables based on their correlation

Statistical Methods used

  • Factor loading scores
  • KMO test accuracy score
  • Uniqueness extraction
  • Communality extraction
  • Correlation Chart
  • Factor Plot
  • Scree Plot
  • Bartlett Test of Sphericity
  • Chi-square value
  • Degrees of Freedom
  • Sigma
  • Eigenvalue
  • PCA
  • Maximum Likelihood

Limitations

  • Factor Analysis cannot be used if there is a defined correlation between variables.
  • After a factor is identified, naming the factor can be a difficult task
  • Factor Analysis reveals an apparent structure in the data even if the variables are extremely random. It may lead to confusion about whether the factor explains the data.
  • It is difficult to decide the number of factors to be retained.
  • The interpretation of the significance of factors is subjective, and the reasoning given by different people can be different.

To illustrate the significance of Factor Analysis, let's consider an example.
Consider three groups of customers choosing three different detergent powder brands. Each group has its reasons for selecting a particular brand. These reasons are compiled in the form of three different sets of data. Each dataset contains variables/features related to information about the group's choices. Then, in this example, Factor Analysis brings out those factors responsible for the choice of a brand.
There are two hypotheses in Factor Analysis.

  • Null Hypothesis: There is no significant correlation between the variables.
  • Alternative Hypothesis: There is a significant correlation between the variables.

It analyzes variables that have an interdependence. This interdependence is examined and established only after the Factor Analysis is completed because we do not classify variables as dependent and independent. All variables before Factor Analysis are treated as independent.
Using Factor Analysis, we reduce the number of variables, and in this process, group the similar variables and remove the irrelevant ones.

Steps in Factor Analysis:

  1. Define the problem statement as to why you want to perform Factor Analysis
  2. Construct the Correlation Matrix
  3. Determine the Pearson Correlation between variables and identify which variables are correlated.
  4. Decide the method to be taken up for Factor Analysis
    1. Rotation for Varimax
    2. Maximum Likelihood
  5. Determine the number of relevant factors for the study. For example, you have a set of seven variables, and you want to reduce them to three. It is an individual decision based on the dataset and analytical requirements. (This is mostly determined using the trial and error methodology)
  6. Rotate the factors and interpret the results.