PCA – Principal Component Analysis

PCA – Principal Component Analysis

Overview

Principal Component Analysis (PCA) reduces high‑dimensional data into fewer components while retaining most variance. It helps simplify data, remove noise, and prepare it for further modeling.

Where to Find PCA

Pipeline → Model Studio → Data Preparation → PCA


Before You Start

• Dataset must contain continuous numerical columns.
• Standardization recommended for accurate PCA.
• Null values in selected columns must be handled.
• PCA is an end‑task for lineage (lineage stops here).

Configuring PCA

• Features – Select numerical columns.

Field

Result

Remarks

Independent Variables

Columns selected for PCA input.

Must be numeric; no null values; multi-select.

Copy

Whether data is copied before processing.

True recommended; False modifies data in place.

Whiten

Scales components to unit variance.

May reduce interpretability.

SVD Solver

Method for Singular Value Decomposition.

auto, full, arpack, randomized.

Tolerance

Threshold for convergence.

Applies mainly to arpack solver.

Iterated Power

Iterations to improve randomized SVD accuracy.

Used when solver=randomized.

N Oversamples

Extra vectors for approximation.

Default 10; used for randomized solver.


Viewing Results

Explore view shows:
• Eigenvalues
• Eigenvectors
• Proportion of Variance
• Component loadings

Notes

• PCA does not track lineage forward.
• Works with import/export, save‑as, and metadata refresh.
• Compatible with Python, RubiNotebook, RubiSQL, RubiSpark.


    • Related Articles

    • Factor Analysis

      Factor Analysis is located under Model Studio ( ) in Data Preparation, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to ...
    • Factor Analysis

      Factor Analysis is located under Model Studio ( ) in Data Preparation, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to ...
    • Decision Tree

      Decision Tree is located under Machine Learning ( ) in Classification, in the task pane on the left. Use drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to ...
    • Decision Tree Regression

      Decision Tree Regression is located under Machine Learning ( ) > Regression > Decision Tree Regression Use the drag-and-drop method (or double-click on the node) to use the algorithm in the canvas. Click the algorithm to view and select different ...
    • Random Forest Regression

      Random Forest Regression is located under Machine Learning ( ) > Regression > Random Forest Regression Use the drag-and-drop method (or double-click on the node) to use the algorithm in the canvas. Click the algorithm to view and select different ...