Duplicate Records Handling / Deduplication
Overview
A new Deduplication feature node has been introduced under the Data Preparation module to identify and remove duplicate records based on full rows or selected columns.
Feature Enhancements
Added Deduplication option under the Data Preparation module.
Users can remove duplicates using entire rows or selected columns.
Multiple column selection supported for deduplication.
Support added for Keep First, Keep Last, and Remove All duplicate handling strategies.
Dataset validation added before execution.
Record count updates after deduplication execution.
All predecessor columns are retained in the output dataset.
Deduplication Module
Deduplication option available under Data Preparation.

Deduplication Configuration
Users can configure duplicate handling columns and strategies.

Benefits
Improved data quality and consistency.
Flexible duplicate handling options.
Better preprocessing support for analytics workflows.
Simplified duplicate record management.
Related Articles
View Log Screen Enhancement
1. Introduction The View Log screen enhancement provides improved filtering, sorting, and workflow-specific visibility within the log panel. These improvements enable users to efficiently analyze workflow and node-level execution details. 2. Feature ...
User Defined Widget Name
The Widget Name Formatter allows users to assign meaningful, custom names to widgets across all chart and widget types in RubiSight dashboards. This helps in easy identification of widgets during interactions such as filtering, interactivity control, ...
Column Lineage
Overview Column Lineage helps you understand how your data moves and transforms across your pipeline. This guide combines all lineage features—generation and visual tree display—into one simple explanation. 1. Lineage Generation (Pipeline) - Accessed ...
Writing to Template File
You can store the result of algorithm flow or the Reader into a Text dataset. You can use the TemplateFile node to create target file datasets within the application. These target files are stored in Text format and can be reused as Text dataset ...
Lookup
Lookup is located under Model Studio ( ) in Data Preparation, in the left task pane. Use the drag-and-drop method to use the feature in the canvas. Click the feature to view and select different properties for analysis. Refer to Properties of Lookup. ...