Train Test Split

Train Test Split
Description	The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test.
Why to use	To evaluate the accuracy of the model with an unknown dataset.
When to use	The dataset contains a large number of rows.	When not to use	Limited data is available.
Prerequisites
Input	Any dataset that contains any form of data – Textual, Categorical, Date, Numerical data.	Output	Dataset split into two parts – Train data and Test data.
Statistical Methods used	Confusion Matrix F Score Adjusted R Square R Square Root Mean Square Error	Limitations	If the data is limited, then there is a possibility of high bias.

The train-test split is a technique to evaluate the accuracy of a model. It is used to make predictions on a large dataset. It is appropriate where a good quick estimate of the model performance is required.

In this technique, the input dataset is divided into two datasets, train and test. The train dataset is used to fit the model by getting the model trained on the input dataset. The expected output of the data is known. The test dataset is used to make predictions on unknown data. It evaluates the performance of the model on new data.

The train-test split is used when sufficiently large data is available. The data in each of the train and test sets should ideally represent the problem. There should be enough records to cover all common and uncommon cases of the problem or situation. If the dataset size is not optimum, it may overfit or underfit the model.

Related Articles
Train Test Split
Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
Train Test Split in Forecasting
Train Test Split in Forecasting Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for Train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. ...
Shapiro-Wilk Test
Shapiro-Wilk Test Description The Shapiro-Wilk test is a normality test in probability determination statistics. It is used to determine whether a simple random sample of a variable’s values has been derived from a normal distribution. Why to use For ...
One Sample T Test
One Sample T Test Description A one-sample t-test is a statistical test for determining if the mean of a single sample varies significantly from a hypothesized population mean. Why to use To determine if there is statistical difference between sample ...
One Sample Z Test
One Sample Z Test Description One-sample z-test is a statistical test used to determine if the mean of a single sample is significantly different, from a hypothesized population mean, when the population standard deviation is known. Why to use ...

Train Test Split

Train Test Split

Related Articles

Train Test Split

Train Test Split in Forecasting

Shapiro-Wilk Test

One Sample T Test

One Sample Z Test