Box-Cox Transformation

Box-Cox Transformation

Box-Cox Transformation

Description

The Box-Cox transformation is a mathematical technique that transforms a non-normal or skewed dataset into a more normal distribution.

Why to use

  • Normality
  • Homoscedasticity
  • Linearity

When to use

  • Non-Normal data


When not to use

  • Negative or zero values

Prerequisites

  • Continuous variable
  • Positive values
  • Handling Outliers
  • Choice of lambda

Input

Any dataset containing numerical variables.

Output

A transformed dataset that follows a more normal distribution.

Statistical Methods Used

  • Maximum Likelihood Estimation (MLE)
  • Method of moments

Limitations

  • Limited applicability
  • Sensitivity to outliers
  • Subjectivity in lambda selection
  • Interpretability
  • Data assumptions

The Box-Cox transformation is a statistical technique that transforms non-normal or skewed data into a more normal distribution. It involves applying a power transformation to the data using a lambda parameter. The specific formula for the change is:

where,

 is the transformed variable,

     is the original variable, and

    is the transformation parameter.


    • Related Articles

    • Johnson Transformation

      Johnson Transformation Description The Johnson transformation is a statistical technique that transforms non-normal data into a normal distribution. It extends the Box-Cox transformation and can handle positively and negatively skewed data. Why to ...
    • Rubiscape Winter '22

      New Features Platform & Studio On-Prem Autoscaling Support for horizontal autoscaling for on-prem deployments of Rubiscape. Data Cleaning Ability to fix common data quality issues such as remove/replace null data, remove punctuations, capitalization, ...
    • Pre-Processing

      It involves data cleaning, data transformation, and data reduction. Every textual data may not be ready Data preprocessing is a data mining technique that involves transforming raw data into an understandable and useful format. Real-world data is ...
    • Features of RubiSight

      Some of the key features of RubiSight are given below. Data Import data from various sources such as Relational databases, Excel spreadsheets, CSV files, text files, social media, Google News, and so on. View the descriptive statistics on measures ...
    • Rubiscape Winter '19

      New Features Platform & Studio New dataset creation feature for Twitter, PostgresSQL, SQL, MySQL, Oracle, Excel, CSV, Google News. Create dataset from a local TXT file using delimiter option. Supported delimiters are Semicolon, Pipe, Comma, Tab, ...