Combined Data Cleansing

Combined Data Cleansing

Combined Data Cleansing

Description

  • It is a data preprocessing task to fix data quality issues and enhance data quality.
  • You can perform several operations on any categorical or numerical data.

Why to use

Data Preprocessing to

  • Remove
  • Whitespaces
  • Line Breaks
  • Null Rows and Columns
  • Numbers from Strings
  • Alphabets from Numbers
  • Punctuations and Special Characters
  • Modify casing in case of Textual Data 

When to use

When you want to remove any deformity or anomaly in the data.

When not to use

When the data is clean and already preprocessed

Prerequisites

 -

Input

Unclean data

Output

Cleansed Data

Statistical Methods used

-

Limitations

-



The table below describes the various data cleansing methods and their corresponding sub-methods.

Method

Sub-Method

Description

Remove Unwanted Characters










Remove leading and trailing whitespaces

Remove any extra whitespace(s) before or after a string, word, or number.

Remove tabs, line breaks, and duplicate whitespaces

Remove any extra tabs, line breaks, or repeated whitespaces in a string.

Remove all whitespaces

Remove leading, trailing, in-between or duplicate whitespaces.

Remove letters

Remove alphabet(s) from a string or a number

Remove numbers

Remove a number(s) from strings or words

Remove punctuations

Remove all punctuation marks from strings, words, or numbers

Modify Case



Upper Case

Convert all letters or words in a string to upper case.

Lower Case

Convert all letters or words in a string to lower case.

Title Case

Capitalize each word in a phrase or a sentence.

Replace Null Data


Replace with blanks (string columns)

Replace any null string value in a cell with a blank space. It renders an empty cell.

Replace with zero (numeric columns)

Replace any null value in a numerical column with zero (0).

Remove Null Data




Remove Null Rows


Remove all rows with a null value in any column

Only remove rows that have a null value in every column

Remove Null Columns


Remove all columns with a null value in any row

Only remove columns that have a null value in every row

    • Related Articles

    • Combined Data Cleansing

      Combined Data Cleansing Description It is a data preprocessing task to fix data quality issues and enhance data quality. You can perform several operations on any categorical or numerical data. Why to use Data Preprocessing to Remove Whitespaces Line ...
    • Data Preparation

      What is Data Preparation Data preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and further analyzed. In data preparation, data is reformatted, corrected, and combined so that it gets ...
    • Data Preparation

      Data preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and analyzed further. In data preparation, data is reformatted, corrected, and combined to enrich the data. Data preparation is ...
    • Data Merge

      Data Merge Description Data Merge involves the combining of two or more rows to include them into one table. Why to use For Data Preparation When to use When you want to merge two or more dataset tables into one table where at least one column is ...
    • Data Preparation in Forecasting

      Data Preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and analyzed further. In data preparation, data is reformatted, corrected, and combined to enrich the data. Data preparation is ...