Duplicate Records Handling / Deduplication

Duplicate Records Handling / Deduplication

Overview

A new Deduplication feature node has been introduced under the Data Preparation module to identify and remove duplicate records based on full rows or selected columns.


Feature Enhancements

  • Added Deduplication option under the Data Preparation module.

  • Users can remove duplicates using entire rows or selected columns.

  • Multiple column selection supported for deduplication.

  • Support added for Keep First, Keep Last, and Remove All duplicate handling strategies.

  • Dataset validation added before execution.

  • Record count updates after deduplication execution.

  • All predecessor columns are retained in the output dataset.


Deduplication Module

Deduplication option available under Data Preparation.



Deduplication Configuration

Users can configure duplicate handling columns and strategies.



Benefits

  • Improved data quality and consistency.

  • Flexible duplicate handling options.

  • Better preprocessing support for analytics workflows.

  • Simplified duplicate record management.

    • Related Articles

    • View Log Screen Enhancement

      1. Introduction The View Log screen enhancement provides improved filtering, sorting, and workflow-specific visibility within the log panel. These improvements enable users to efficiently analyze workflow and node-level execution details. 2. Feature ...
    • User Defined Widget Name

      The Widget Name Formatter allows users to assign meaningful, custom names to widgets across all chart and widget types in RubiSight dashboards. This helps in easy identification of widgets during interactions such as filtering, interactivity control, ...
    • Column Lineage

      Overview Column Lineage helps you understand how your data moves and transforms across your pipeline. This guide combines all lineage features—generation and visual tree display—into one simple explanation. 1. Lineage Generation (Pipeline) - Accessed ...
    • Writing to Template File

      You can store the result of algorithm flow or the Reader into a Text dataset. You can use the TemplateFile node to create target file datasets within the application. These target files are stored in Text format and can be reused as Text dataset ...
    • Lookup

      Lookup is located under Model Studio ( ) in Data Preparation, in the left task pane. Use the drag-and-drop method to use the feature in the canvas. Click the feature to view and select different properties for analysis. Refer to Properties of Lookup. ...