Time-series Data Preparation organizes and formats transactional data into time-series data to predict trends and seasonality in the data.
Transactional data is timestamped data recorded over a period at no specific frequency, while time-series data is timestamped data recorded over a period at a particular frequency. The frequency or time interval can be from seconds to yearly or any other decided interval.
Transactional data is converted into time-series data by attributing frequency to the data. This is achieved by bundling or aggregating the transactional data into time-series data of the selected interval. Transactional data, when attributed with frequency, can be analyzed as time-series data.
Why is Time-series Data Preparation required
You can predict trends and seasonal variations in the time-series data that are not visible in transactional data.
How is Time-series Data Preparation done in rubiscape
Using the Rubiscape Time-series Data Preparation feature, you can analyze your time-series data by performing the following tests.
The feature converts the irregularly recorded timestamped data into data at the time interval defined by you. You can then run either one or more of the four data preparation tests in rubiscape, in any order you want. You can also run all four available tests simultaneously. When all the tests are selected to run, the tests are executed in the order – accumulation, missing value imputation, transformation, and differencing.
In differencing, the seasonality in the time-series data is identified and removed to make the data stationary.
Time-series Data Preparation Tests in Forecasting
The different tests available in Time-series Data Preparation under Forecasting are given below.
Accumulation | |||
Description | Accumulation is data aggregation in the time domain. Accumulation combines the data within the same time interval to give a summary output for that period. | ||
Why to use | The data accumulation process is used to
| ||
When to use | To discover trends and seasonal variations based on time-series data. | When not to use | On non-interval data. |
Prerequisites |
| ||
Input | Any dataset that contains a time interval. | Output | The summarized time-series data based on the selected time interval. |
Statistical Methods used |
| Limitations | Not applicable for sequential datasets (that is, data that does not include the datetime column). |
The table given below describes the functions of the Accumulation test.
Function | Description | Remark |
Sum | It gives the accumulation of the time-series data by the sum of the values in the given interval. | It can be performed only on numerical data. |
Mean | It gives the accumulation of the time-series data by the mean of the values in the given interval. | It can be performed only on numerical data. |
Minimum | It gives the accumulation of the time-series data by the minimum value in the given interval. | It can be performed only on numerical data. |
Median | It gives the accumulation of the time-series data by the median value in the given interval. | It can be performed only on numerical data. |
Maximum | It gives the accumulation of the time-series data by the maximum value in the given interval. | It can be performed only on numerical data. |
Missing Value | |||
Description | The time-series data may contain missing values that need to be imputed. The time-series missing value interpretation imputes the missing information in time-series data. The time-series missing value interpretation is performed only on missing values. The values present in the dataset remain unchanged. | ||
Why to use | To impute missing values in the time-series data. | ||
When to use | For analysis of time-series data without losing the variation in the data. | When not to use | When data do not contain any missing values. |
Prerequisites | The time interval for the data to be analyzed should be specified. | ||
Input | Time-series data with fixed time interval or time-series data. | Output | A complete time-series data for the specified time interval having no missing values. |
Statistical Methods used |
| Limitations |
|
The table given below describes the functions of the Missing Value test.
Function | Description | Remark |
Mean | It replaces the missing values with the mean of the non-missing values within each column separately and independently from the others. |
|
Median | It replaces the missing values with the median of the non-missing values within each column separately and independently from the others. |
|
Min | It replaces the missing values with the minimum value present in that column. | – |
Max | It replaces the missing values with the maximum value present in that column. | – |
Remove | It discards the rows that contain missing values. |
|
Constant | It replaces the missing values with the constant value that you have entered. |
|
Random | It replaces the missing values with random values from that column. | It can only be used with numerical data. |
Transformation | |||
Description | Data transformation in time-series data removes noise and improves the signal in time-series forecasting. There are different functions used to transform time-series data, useful for visualizing time-series data and modeling the time-series data. An inverse transform is applied to the predictions of a transform function applied to time-series data. This ensures that the resultant performance measures are on the same scale as the output variable. The transformation method assumes that the time-series data is positive and non-zero. | ||
Why to use | To stabilize the variance across time in time-series data for more accurate forecasts. | ||
When to use | To simplify patterns in time-series data by removing variation across time or by making the pattern consistent across the dataset. | When not to use | When the data is already normalized between 0 and 1. |
Prerequisites |
| ||
Input | High variance time-series data. | Output | Time-series data with less variance and consistent pattern. |
Statistical Methods used |
| Limitations | Log and Natural Log functions are not applicable to data that contains zero. Since log(0) = -Inf. |
The table given below describes the functions of the Transformation test.
Function | Description | Remark |
Exponential | It transforms the time-series by taking the exponential e (2.7183) of the values in the given interval. | It can only be used with numerical data. |
Square | It transforms the time-series by taking the square of the values in the given interval. | It can only be used with numerical data. |
Square Root | It transforms the time-series by taking the square root of the values in the given interval. | It can only be used with numerical data. |
Natural Log | It transforms the time-series by taking the natural logarithm (logarithm to the base 10) of the values in the given interval. | It can only be used with numerical data. |
Log | It transforms the time-series by taking the logarithm (logarithm to the base 2) of the values in the given interval. | It can only be used with numerical data. |
Inverse Exponential | It transforms the time-series by taking the inverse exponential of the values in the given interval. | It can only be used with numerical data. |
Inverse Square | It transforms the time-series by taking the inverse square of the values in the given interval. | It can only be used with numerical data. |
Inverse Square Root | It transforms the time-series by taking the inverse square root of the values in the given interval. | It can only be used with numerical data. |
Inverse Natural Log | It transforms the time-series by taking the inverse natural logarithm of the values in the given interval. | It can only be used with numerical data. |
Inverse Log | It transforms the time-series by taking the inverse logarithm of the values in the given interval. | It can only be used with numerical data. |
Differencing | |||
Description | Differencing is a method of transforming time-series data by removing the trends and seasonality in the data to make the time-series data stationary. In non-stationary time-series data, trends result in varying mean over time, while seasonality results in variance over time. Stationary datasets have a stable mean and variance and hence are easier to model. In differencing, the previous observation is subtracted from the current observation. In a time-series data with a lag of n, differencing converts every ith observation of the series into its difference from the (i-n)th observation. | ||
Why to use | To make time-series data stationary since stationary data with stable mean and variance is easier to model. | ||
When to use | For removing trends and seasonality in time-series data before modeling. | When not to use | When data is already stationary. |
Prerequisites |
| ||
Input | The transformed time-series data. | Output | Stationary time-series data. |
Statistical Methods used | Lag Difference | Limitations | The lag difference should be less than the number of data points in the data. |