PCA – Principal Component Analysis

Overview

Principal Component Analysis (PCA) reduces high‑dimensional data into fewer components while retaining most variance. It helps simplify data, remove noise, and prepare it for further modeling.

Where to Find PCA

Pipeline → Model Studio → Data Preparation → PCA

Before You Start

• Dataset must contain continuous numerical columns.

• Standardization recommended for accurate PCA.

• Null values in selected columns must be handled.

• PCA is an end‑task for lineage (lineage stops here).

Configuring PCA

• Features – Select numerical columns.

Field	Result	Remarks
Independent Variables	Columns selected for PCA input.	Must be numeric; no null values; multi-select.
Copy	Whether data is copied before processing.	True recommended; False modifies data in place.
Whiten	Scales components to unit variance.	May reduce interpretability.
SVD Solver	Method for Singular Value Decomposition.	auto, full, arpack, randomized.
Tolerance	Threshold for convergence.	Applies mainly to arpack solver.
Iterated Power	Iterations to improve randomized SVD accuracy.	Used when solver=randomized.
N Oversamples	Extra vectors for approximation.	Default 10; used for randomized solver.

Viewing Results

Explore view shows:

• Eigenvalues

• Eigenvectors

• Proportion of Variance

• Component loadings

Notes

• PCA does not track lineage forward.

• Works with import/export, save‑as, and metadata refresh.

• Compatible with Python, RubiNotebook, RubiSQL, RubiSpark.

Related Articles
Factor Analysis
Factor Analysis is located under Model Studio ( ) in Data Preparation, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to ...
Factor Analysis
Factor Analysis is located under Model Studio ( ) in Data Preparation, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to ...
Decision Tree
Decision Tree is located under Machine Learning ( ) in Classification, in the task pane on the left. Use drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to ...
Decision Tree Regression
Decision Tree Regression is located under Machine Learning ( ) > Regression > Decision Tree Regression Use the drag-and-drop method (or double-click on the node) to use the algorithm in the canvas. Click the algorithm to view and select different ...
Random Forest Regression
Random Forest Regression is located under Machine Learning ( ) > Regression > Random Forest Regression Use the drag-and-drop method (or double-click on the node) to use the algorithm in the canvas. Click the algorithm to view and select different ...

PCA – Principal Component Analysis

PCA – Principal Component Analysis

Overview

Where to Find PCA

Before You Start

Configuring PCA

Viewing Results

Notes

Related Articles

Factor Analysis

Factor Analysis

Decision Tree

Decision Tree Regression

Random Forest Regression