Overview
Column Lineage helps you understand how
your data moves and transforms across your pipeline. This guide combines all
lineage features—generation and visual tree display—into one simple
explanation.
1. Lineage Generation (Pipeline)
- Accessed from Pipeline kebab menu →
'Calculate Lineage'.
- Enabled only if pipeline is saved.
- Must be regenerated after pipeline changes.
- Lineage deleted automatically if workflow is deleted.
- JSON lineage can be downloaded.
- Tasks that stop lineage: PCA, Deep Learning, File Management, Textual
Analysis, Control, Factor Analysis, and all Pro Code tasks except Python.
- Fully tracks missing value imputation, expression-based columns, filters,
metadata refresh, and Save As workflows.
2. Writer & Model Task Handling
- Writer nodes store subDatasetKey or
fallback to subDatasetName.
- Multiple writer outputs → only active configuration is used for lineage.
- Model tasks generate multiple lineage rows for each mapped output column.
3. Column Lineage UI (Tree View + JSON)
- Access from Pipeline page → 'View Column
Lineage'.
- User selects Dataset → Column.
- Click 'Get Column Lineage' to load the lineage.
- Left panel: visual tree representation (GoJS).
- Right panel: raw JSON lineage (same structure returned via API/Postman).
- JSON is downloadable; works on all storage types.
4. Steps
1️⃣ Save Pipeline.
2️⃣ Open kebab menu → Calculate Lineage.
3️⃣ Open View Lineage→ select dataset &
column.
4️⃣ Click 'Get Column Lineage'.
5️⃣ Review lineage tree and JSON details.
5. Note:
If a user creates a template file inside a
workbook and later uses that template as a Reader in another pipeline, the column lineage cannot be tracked for that
dataset.