Time-series Data Preparation

Time-series Data Preparation

Time-series Data Preparation organizes and formats transactional data into time-series data to predict trends and seasonality in the data.

Transactional data is timestamped data recorded over a period at no specific frequency, while time-series data is timestamped data recorded over a period at a particular frequency. The frequency or time interval can be from seconds to yearly or any other decided interval. 

Transactional data is converted into time-series data by attributing frequency to the data. This is achieved by bundling or aggregating the transactional data into time-series data of the selected interval. Transactional data, when attributed with frequency, can be analyzed as time-series data.

Why is Time-series Data Preparation required

You can predict trends and seasonal variations in the time-series data that are not visible in transactional data.

How is Time-series Data Preparation done in rubiscape

Using the Rubiscape Time-series Data Preparation feature, you can analyze your time-series data by performing the following tests.

  • Accumulation
  • Missing value imputation
  • Transformation
  • Differencing

The feature converts the irregularly recorded timestamped data into data at the time interval defined by you. You can then run either one or more of the four data preparation tests in rubiscape, in any order you want. You can also run all four available tests simultaneously. When all the tests are selected to run, the tests are executed in the order – accumulation, missing value imputation, transformation, and differencing.  

  1. In accumulation, the time-series data is aggregated in the selected time interval.
  2. In missing value imputation, the missing values in the data are removed or replaced in the accumulated time-series data.
  3. In transformation, the time-series data is normalized. The data is rescaled from the original range to a new range between 0 and 1.

In differencing, the seasonality in the time-series data is identified and removed to make the data stationary.

Time-series Data Preparation Tests in Forecasting

The different tests available in Time-series Data Preparation under Forecasting are given below.

Accumulation

Accumulation

Description

Accumulation is data aggregation in the time domain. Accumulation combines the data within the same time interval to give a summary output for that period.

Why to use

The data accumulation process is used to

  • Convert time-series data with no fixed time interval into time-series data with a fixed time interval (weekly, monthly, or yearly).
  • Convert time-series data with fixed time intervals into time-series data of a higher frequency time interval (for example, daily to monthly) or lower frequency time interval (for example, monthly into weekly, yearly into monthly).

When to use

To discover trends and seasonal variations based on time-series data.

When not to use

On non-interval data.

Prerequisites

  • The selected independent variable should be of interval type.
  • The time interval for the data to be analyzed should be specified.

Input

Any dataset that contains a time interval.

Output

The summarized time-series data based on the selected time interval.

Statistical Methods used

  • Sum
  • Mean
  • Minimum
  • Median
  • Maximum

Limitations

Not applicable for sequential datasets (that is, data that does not include the datetime column).

Functions of Accumulation Test

The table given below describes the functions of the Accumulation test.

Function

Description

Remark

Sum

It gives the accumulation of the time-series data by the sum of the values in the given interval.

It can be performed only on numerical data.

Mean

It gives the accumulation of the time-series data by the mean of the values in the given interval.

It can be performed only on numerical data.

Minimum

It gives the accumulation of the time-series data by the minimum value in the given interval.

It can be performed only on numerical data.

Median

It gives the accumulation of the time-series data by the median value in the given interval.

It can be performed only on numerical data.

Maximum

It gives the accumulation of the time-series data by the maximum value in the given interval.

It can be performed only on numerical data.

Missing Value

Missing Value

Description

The time-series data may contain missing values that need to be imputed. The time-series missing value interpretation imputes the missing information in time-series data.

The time-series missing value interpretation is performed only on missing values. The values present in the dataset remain unchanged.

Why to use

To impute missing values in the time-series data.

When to use

For analysis of time-series data without losing the variation in the data.

When not to use

When data do not contain any missing values.

Prerequisites

The time interval for the data to be analyzed should be specified.

Input

Time-series data with fixed time interval or time-series data.

Output

A complete time-series data for the specified time interval having no missing values.

Statistical Methods used

  • Mean
  • Median
  • Min
  • Max
  • Remove
  • Constant
  • Random

Limitations

  • It does not account for the uncertainty in the imputations.
  • It can introduce bias in the data.

Functions of Missing Value Test

The table given below describes the functions of the Missing Value test.

Function

Description

Remark

Mean

It replaces the missing values with the mean of the non-missing values within each column separately and independently from the others.

  • It only works on the column level.
  • It can only be used with numerical data.

Median

It replaces the missing values with the median of the non-missing values within each column separately and independently from the others.

  • It only works on the column level.
  • It can only be used with numerical data.

Min

It replaces the missing values with the minimum value present in that column.

Max

It replaces the missing values with the maximum value present in that column.

Remove

It discards the rows that contain missing values.

  • It can be used for a small amount of missing data (20-30%)
  • Removing a large amount of data may cause considerable variations in the results.
  • If there is a large amount of missing data, it is recommended to remove the complete column (If you want to remove a column, do not select that column while analyzing).

Constant

It replaces the missing values with the constant value that you have entered.

  • It can only be used with numerical data.
  • It can introduce bias in the data.

Random

It replaces the missing values with random values from that column.

It can only be used with numerical data.

Transformation

Transformation

Description

Data transformation in time-series data removes noise and improves the signal in time-series forecasting.

There are different functions used to transform time-series data, useful for visualizing time-series data and modeling the time-series data. An inverse transform is applied to the predictions of a transform function applied to time-series data. This ensures that the resultant performance measures are on the same scale as the output variable.

The transformation method assumes that the time-series data is positive and non-zero.

Why to use

To stabilize the variance across time in time-series data for more accurate forecasts.

When to use

To simplify patterns in time-series data by removing variation across time or by making the pattern consistent across the dataset.

When not to use

When the data is already normalized between 0 and 1.

Prerequisites

  • The selected independent variable should be of interval type.
  • The time interval for the data to be analyzed should be specified.

Input

High variance time-series data.

Output

Time-series data with less variance and consistent pattern.

Statistical Methods used

  • Exponential
  • Square
  • Square Root
  • Natural Log
  • Log
  • Inverse Exponential
  • Inverse Square
  • Inverse Square Root
  • Inverse Natural Log
  • Inverse Log

Limitations

Log and Natural Log functions are not applicable to data that contains zero. Since log(0) = -Inf.

Functions of Transformation Test

The table given below describes the functions of the Transformation test.

Function

Description

Remark

Exponential

It transforms the time-series by taking the exponential e (2.7183) of the values in the given interval.

It can only be used with numerical data.

Square

It transforms the time-series by taking the square of the values in the given interval.

It can only be used with numerical data.

Square Root

It transforms the time-series by taking the square root of the values in the given interval.

It can only be used with numerical data.

Natural Log

It transforms the time-series by taking the natural logarithm (logarithm to the base 10) of the values in the given interval.

It can only be used with numerical data.

Log

It transforms the time-series by taking the logarithm (logarithm to the base 2) of the values in the given interval.

It can only be used with numerical data.

Inverse Exponential

It transforms the time-series by taking the inverse exponential of the values in the given interval.

It can only be used with numerical data.

Inverse Square

It transforms the time-series by taking the inverse square of the values in the given interval.

It can only be used with numerical data.

Inverse Square Root

It transforms the time-series by taking the inverse square root of the values in the given interval.

It can only be used with numerical data.

Inverse Natural Log

It transforms the time-series by taking the inverse natural logarithm of the values in the given interval.

It can only be used with numerical data.

Inverse Log

It transforms the time-series by taking the inverse logarithm of the values in the given interval.

It can only be used with numerical data.

Differencing 

Differencing

Description

Differencing is a method of transforming time-series data by removing the trends and seasonality in the data to make the time-series data stationary.

In non-stationary time-series data, trends result in varying mean over time, while seasonality results in variance over time. Stationary datasets have a stable mean and variance and hence are easier to model.

In differencing, the previous observation is subtracted from the current observation.

In a time-series data with a lag of n, differencing converts every ith observation of the series into its difference from the (i-n)th observation.

Why to use

To make time-series data stationary since stationary data with stable mean and variance is easier to model.

When to use

For removing trends and seasonality in time-series data before modeling.

When not to use

When data is already stationary.

Prerequisites

  • The selected independent variable should be of interval type.
  • The time interval for the data to be analyzed should be specified.
  • A transformation test should be performed.

Input

The transformed time-series data.

Output

Stationary time-series data.

Statistical Methods used

Lag Difference


Limitations

The lag difference should be less than the number of data points in the data.


    • Related Articles

    • Time-series Data Preparation Tests in Forecasting

      The different tests available in Time-series Data Preparation under Forecasting are given below. Accumulation Missing Value Transformation Differencing Data Preparation Description The time-series data may contain missing values that need to be ...
    • Data Preparation

      Data preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and analyzed further. In data preparation, data is reformatted, corrected, and combined to enrich the data. Data preparation is ...
    • Data Preparation

      What is Data Preparation Data preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and further analyzed. In data preparation, data is reformatted, corrected, and combined so that it gets ...
    • Data Preparation in Forecasting

      Data Preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and analyzed further. In data preparation, data is reformatted, corrected, and combined to enrich the data. Data preparation is ...
    • Data Joiner

      Data Joiner Description Data joiner is a method to join two or more datasets. It is used to join rows based on a related column present in two or more datasets. Why to use For Data Preparation When to use When you want to join two or more datasets. ...