Combined Data Cleansing | |||
Description |
| ||
Why to use | Data Preprocessing to
| ||
When to use | When you want to remove any deformity or anomaly in the data. | When not to use | When the data is clean and already preprocessed |
Prerequisites | - | ||
Input | Unclean data | Output | Cleansed Data |
Statistical Methods used | - | Limitations | - |
The table below describes the various data cleansing methods and their corresponding sub-methods.
Method | Sub-Method | Description |
---|---|---|
Remove Unwanted Characters | Remove leading and trailing whitespaces | Remove any extra whitespace(s) before or after a string, word, or number. |
Remove tabs, line breaks, and duplicate whitespaces | Remove any extra tabs, line breaks, or repeated whitespaces in a string. | |
Remove all whitespaces | Remove leading, trailing, in-between or duplicate whitespaces. | |
Remove letters | Remove alphabet(s) from a string or a number | |
Remove numbers | Remove a number(s) from strings or words | |
Remove punctuations | Remove all punctuation marks from strings, words, or numbers | |
Modify Case | Upper Case | Convert all letters or words in a string to upper case. |
Lower Case | Convert all letters or words in a string to lower case. | |
Title Case | Capitalize each word in a phrase or a sentence. | |
Replace Null Data | Replace with blanks (string columns) | Replace any null string value in a cell with a blank space. It renders an empty cell. |
Replace with zero (numeric columns) | Replace any null value in a numerical column with zero (0). | |
Remove Null Data | Remove Null Rows | Remove all rows with a null value in any column |
Only remove rows that have a null value in every column | ||
Remove Null Columns | Remove all columns with a null value in any row | |
Only remove columns that have a null value in every row |