Train Test Split in Forecasting

Train Test Split in Forecasting

Train Test Split in Forecasting

Description

The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for Train and test.

Why to use

To evaluate the accuracy of the model with an unknown dataset.

When to use

The dataset contains a large number of rows.

When not to use

Limited data is available.

Prerequisites

In Forecasting, the data should NOT be shuffled because it contains seasonality. Hence the date stamp and its sequence are very crucial. (For this reason, shuffling is disabled in Forecasting. That is, it is taken as False for Train Test Split.)

Input

Any dataset that contains any form of data – Textual, Categorical, Date, Numerical data.

Output

Dataset split into two parts – Train data and Test data.

Statistical Methods used

--

Limitations

If the data is limited, then there is a possibility of high bias.


The train-test split is a technique to evaluate the accuracy of a model. It is used to make predictions on a large dataset. It is appropriate where a good quick estimate of the model performance is required.

In this technique, the input dataset is divided into two datasets, Train, and Test. The train dataset is used to fit the model by getting the model trained on the input dataset. The expected output of the data is known. The test dataset is used to make predictions on unknown data. It evaluates the performance of the model on new data.

The data in each of the Train and test sets should ideally represent the problem. There should be enough records to cover all common and uncommon cases of the problem or situation. If the dataset size is not optimum, it may overfit or underfit the model. 

    • Related Articles

    • Train Test Split

      Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
    • Train Test Split

      Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
    • Time-series Data Preparation Tests in Forecasting

      The different tests available in Time-series Data Preparation under Forecasting are given below. Accumulation Missing Value Transformation Differencing Data Preparation Description The time-series data may contain missing values that need to be ...
    • Moving Average in Forecasting

      Moving Average in Forecasting Description The Moving Average is also known as Naïve Forecasting or moving/rolling mean. It is an indicator that creates a series of averages of several subsets of a complete dataset Why to use The Moving Average is ...
    • Shapiro-Wilk Test

      Shapiro-Wilk Test Description The Shapiro-Wilk test is a normality test in probability determination statistics. It is used to determine whether a simple random sample of a variable’s values has been derived from a normal distribution. Why to use For ...