Data Preparation
Train Test Split
Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
Stratified Sampling
Stratified Sampling Description It divides the population in homogeneous subpopulations or strata, depending on characteristics. Each member should be assigned exactly only one stratum. Why to use To obtain a sample that best represents the entire ...
SMOTE
SMOTE Description SMOTE (Synthetic Minority Oversampling Technique) is an oversampling technique that generates synthetic samples for the minority class in an imbalanced classification dataset. Why to use To solve the imbalanced data problem. To ...
Cross Validation
Cross Validation Description The dataset is divided randomly into a number of groups called k folds. Each fold is considered training data during each iteration, and the remaining folds are considered test data. This is repeated until each of the ...
Sampling
Sampling is a technique used in Statistical Analysis in which a fixed number of data points are selected from a large dataset. This selection of smaller subsets helps to perform analysis faster and at a low computational cost. There are different ...
Sorting
Sorting Description Sorting of numerical data involves the arrangement of data points either in ascending or descending order. Why to use Numerical Analysis – Data Preparation When to use When you want to arrange the numerical data in a particular ...
Sequence Generator
Sequence Generator Description Sequence Generator adds a sequence column to your dataset. Why to use To add Surrogate Keys, Primary Keys to the dataset. When to use When you want to add a sequence column to your dataset. When not to use — ...
Outlier Detection
Outlier Detection Description Outlier Detection reveals the extreme values that deviate from the rest of the data in a real-world dataset. Why to use Numerical Analysis – Data Preparation When to use When there are certain values in the data which ...
Missing Value Imputation
Missing Value Imputation Description Missing value imputation is the attribution of values in place of missing values in a real-world dataset. Why to use Numerical Analysis – Data Preparation When to use When there are missing values in the data. ...
Lookup for Categorical Variables
The fuzzy lookup is based on the fuzzy logic in mathematics. It is supported only for categorical variables. Methods There are three methods for this feature: Threshold Matching: It compares the string values based on fuzzy logic and calculates a ...
Lookup
Lookup Description Lookup helps you to match values of specified fields in two data sources. Why to use To compare values in data sources. When to use To determine the presence of a particular field from one data source in another data source. When ...
Filtering
Filtering Description Filtering of numerical or textual, or categorical data based on provided filtering expression. Why to use To filter out certain values from a dataset – Data Preparation When to use When you want to use a subset of the dataset ...
File Management
File Management Description File management allows you to manage files in different types of storage like GCP, Azure, S3, and FTP. You are allowed to perform various operations like create, copy & paste, zip, Rename, Delete, cut & paste, wait, and ...
Factor Analysis
Factor Analysis Description Factor Analysis is also known as exploratory Factor Analysis for data reduction. It is a technique of examining interdependent variables without distinguishing between dependent and independent variables. Factor Analysis ...
Expression
Expression Description Expression involves creating additional features in a dataset by combining existing features in different ways using various expressions and functions. Why to use For Data Preparation When to use To create additional features ...
Descriptive Statistics
Descriptive Statistics Description Descriptive statistics involves the calculation of various statistical measures such as the measure of central tendency, the measure of variability, percentiles, and also the diagrammatic & graphical representation ...
Data Unpivot
Data Unpivot Description Data Unpivot is a way of transforming data from a wide format to a linear format. The source data is rearranged to make it a part of a single column in the new dataset. Why to use To transform the column data into row data ...
Data Pivot
Data Pivot Description Data Pivot is a way of transforming data from a tall format to a wide format. The source data is rearranged in a way that unique values are converted into columns. Why to use To transform the row data into column data When to ...
Data Merge
Data Merge Description Data Merge involves the combining of two or more rows to include them into one table. Why to use For Data Preparation When to use When you want to merge two or more dataset tables into one table where at least one column is ...
Data Joiner
Data Joiner Description Data joiner is a method to join two or more datasets. It is used to join rows based on a related column present in two or more datasets. Why to use For Data Preparation When to use When you want to join two or more datasets. ...
Combined Data Cleansing
Combined Data Cleansing Description It is a data preprocessing task to fix data quality issues and enhance data quality. You can perform several operations on any categorical or numerical data. Why to use Data Preprocessing to Remove Whitespaces Line ...
Aggregation
Aggregation Description Aggregation of categorical data involves the gathering of information for statistical analysis and expressing it in a summarized form. Why to use Numerical Analysis – Data Preparation When to use When you want to collect ...
Data Preparation
Data preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and analyzed further. In data preparation, data is reformatted, corrected, and combined to enrich the data. Data preparation is ...