Column Lineage

Column Lineage

Overvie​w

Column Lineage helps you understand how your data moves and transforms across your pipeline. This guide combines all lineage features—generation and visual tree display—into one simple explanation.

1. Lineage Generation (Pipeline)

- Accessed from Pipeline kebab menu → 'Calculate Lineage'.
- Enabled only if pipeline is saved.
- Must be regenerated after pipeline changes.
- Lineage deleted automatically if workflow is deleted.
- JSON lineage can be downloaded.
- Tasks that stop lineage: PCA, Deep Learning, File Management, Textual Analysis, Control, Factor Analysis, and all Pro Code tasks except Python.
- Fully tracks missing value imputation, expression-based columns, filters, metadata refresh, and Save As workflows.

2. Writer & Model Task Handling

- Writer nodes store subDatasetKey or fallback to subDatasetName.
- Multiple writer outputs → only active configuration is used for lineage.
- Model tasks generate multiple lineage rows for each mapped output column.

3. Column Lineage UI (Tree View + JSON)

- Access from Pipeline page → 'View Column Lineage'.
- User selects Dataset → Column.
- Click 'Get Column Lineage' to load the lineage.
- Left panel: visual tree representation (GoJS).
- Right panel: raw JSON lineage (same structure returned via API/Postman).
- JSON is downloadable; works on all storage types.

4. Steps

1️⃣ Save Pipeline.
2️⃣ Open kebab menu → Calculate Lineage.


3️⃣ Open View Lineage→ select dataset & column.

4️⃣ Click 'Get Column Lineage'.
5️⃣ Review lineage tree and JSON details.


5. Note:

If a user creates a template file inside a workbook and later uses that template as a Reader in another pipeline, the column lineage cannot be tracked for that dataset.



    • Related Articles

    • Column

      The column formatting option is available in widgets that contain Tables. The table given below describes different fields present on Column formatting. Field Description Remark Header Font Font of the Header row — Background Color It allows you to ...
    • Filter Column

      Filter Column in rubiscape allows to choose between the columns in the table you want to keep and remove. Also, it allows you to rename the columns and pass them to the successor node. Filter Column is located under Model Studio >> Data Preparation ...
    • Editing Calculated Column

      You can edit the calculated column. To edit a calculated column, follow the steps given below. Open the Dashboard in edit mode. Refer to Editing a Dashboard. The dashboard is displayed. In the DATA pane, locate the required dataset and then click the ...
    • Add Cache Calculated Column

      Rubiscape allows you to create a Cache Calculated Column in the dashboard. It has similar functionality as the Calculated Column. While creating a Cache Calculated Column, a cache file is generated. After every update in the source file or calculated ...
    • Deleting Calculated Column

      You can delete the calculated column from the dashboard. To delete a calculated column, follow the steps given below. Note: The calculated column is displayed with a copyright symbol () next to it. Open the Dashboard in edit mode. Refer to Editing a ...