Missing Value Imputation | |||||
Description | Missing value imputation is the attribution of values in place of missing values in a real-world dataset. | ||||
Why to use | Numerical Analysis – Data Preparation | ||||
When to use | When there are missing values in the data. | When not to use |
| ||
Prerequisites | It should be used on numerical data. | ||||
Input | Output |
| |||
Statistical Methods used |
| Limitations |
|
There are many ways data can end up with missing values. For example
Python libraries represent missing numbers as NaN which is short for "not a number".
Most libraries (including scikit-learn) will give you an error if you try to build a model using data with missing values. So, you will need to choose one of the strategies to impute missing values.
Missing value imputation is the attribution of values in place of missing values in a real-world dataset.
Many times, there are missing values in datasets. These datasets are incompatible for scikit estimators because these estimators assume that all values are meaningful numerical values. If we eliminate the rows in a dataset containing missing values, we may lose important and relevant data. Hence, missing value imputation fills the missing gaps by inferring the value from the known part of the data.
Missing value imputation can be univariate or multivariate. In univariate imputation, the missing value is replaced by a constant value or a statistical value like the mean or the median of the corresponding column. In multivariate imputation, each feature with missing value is modeled as a function of other features, and then this estimate is used for imputation.