Data Preparation
Train Test Split
Train Test Split Description The data is split randomly into train data and test data. Ideally, the split is in the ratio of 70:30 or 80:20 for train and test. Why to use To evaluate the accuracy of the model with an unknown dataset. When to use The ...
Stratified Sampling
Stratified Sampling Description It divides the population in homogeneous subpopulations or strata, depending on characteristics. Each member should be assigned exactly only one stratum. Why to use To obtain a sample that best represents the entire ...
SMOTE
The Synthetic Minority Oversampling Technique or SMOTE is a technique for balancing the classification datasets with an acute class imbalance. It is a data augmentation technique in which synthetic samples are generated for the minority class. SMOTE ...
Cross Validation
Cross Validation Description The dataset is divided randomly into a number of groups called k folds. Each fold is considered training data during each iteration, and the remaining folds are considered test data. This is repeated until each of the ...
Sampling
Sampling is a technique used in Statistical Analysis in which a fixed number of data points are selected from a large dataset. This selection of smaller subsets helps to perform analysis faster and at a low computational cost. There are different ...
Sequence Generator
Sequence Generator Description Sequence Generator adds a sequence column to your dataset. Why to use To add Surrogate Keys, Primary Keys to the dataset. When to use When you want to add a sequence column to your dataset. When not to use — ...
Outlier Detection
Outlier Detection Description Outlier Detection reveals the extreme values that deviate from the rest of the data in a real-world dataset. Why to use Numerical Analysis – Data Preparation When to use When there are certain values in the data which ...
Missing Value Imputation
Missing Value Imputation Description Missing value imputation is the attribution of values in place of missing values in a real-world dataset. Why to use Numerical Analysis – Data Preparation When to use When there are missing values in the data. ...
Lookup
Lookup Description Lookup helps you to match values of specified fields in two data sources. Why to use To compare values in data sources. When to use To determine the presence of a particular field from one data source in another data source. When ...
FactorAnalysis
Factor Analysis Description Factor Analysis is also known as exploratory Factor Analysis for data reduction. It is a technique of examining interdependent variables without distinguishing between dependent and independent variables. Factor Analysis ...
Data Unpivot
Data Unpivot Description Data Unpivot is a way of transforming data from a wide format to a linear format. The source data is rearranged to make it a part of a single column in the new dataset. Why to use To transform the column data into row data ...
Sorting
Sorting Description Sorting of numerical data involves the arrangement of data points either in ascending or descending order. Why to use Numerical Analysis – Data Preparation When to use When you want to arrange the numerical data in a particular ...
Data Pivot
Data Pivot Description Data Pivot is a way of transforming data from a tall format to a wide format. The source data is rearranged in a way that unique values are converted into columns. Why to use To transform the row data into column data When to ...
Data Merge
Data Merge Description Data Merge involves the combining of two or more rows to include them into one table. Why to use For Data Preparation When to use When you want to merge two or more dataset tables into one table where at least one column is ...
Data Joiner
Data Joiner Description Data joiner is a method to join two or more datasets. It is used to join rows based on a related column present in two or more datasets. Why to use For Data Preparation When to use When you want to join two or more datasets. ...
Filtering
Filtering Description Filtering of numerical or textual, or categorical data based on provided filtering expression. Why to use To filter out certain values from a dataset – Data Preparation When to use When you want to use a subset of the dataset ...
Expression
Expression Description Expression involves creating additional features in a dataset by combining existing features in different ways using various expressions and functions. Why to use For Data Preparation When to use To create additional features ...
Combined Data Cleansing
Combined Data Cleansing Description It is a data preprocessing task to fix data quality issues and enhance data quality. You can perform several operations on any categorical or numerical data. Why to use Data Preprocessing to Remove Whitespaces Line ...
Aggregation
Aggregation Description Aggregation of categorical data involves the gathering of information for statistical analysis and expressing it in a summarized form. Why to use Numerical Analysis – Data Preparation When to use When you want to collect ...
Popular Articles
Sequence Generator
Sequence Generator Description Sequence Generator adds a sequence column to your dataset. Why to use To add Surrogate Keys, Primary Keys to the dataset. When to use When you want to add a sequence column to your dataset. When not to use — ...
Changing the Workspace
A workspace is a place where you can manage multiple datasets and projects. Workspaces are the parent structures that include datasets and projects. Workspaces are mapped to the login, which means you may have limited access to specific workspaces as ...
Advcance Course in AI_ML-Application form filling Guide
Course Application Help Guide Please follow the process below mentioned, for course application. 1. Register yourself on [ https://campus.unipune.ac.in/ccep/login.aspx ] 2. Select your Nationality and fill in Email id 3. Verify your email address 4. ...
Twitter
Rubiscape provides the functionality to create a dataset using Twitter data. On this data, you can perform Twitter Sentiment Analysis using the text processing algorithms provided in Text Analytics. The sentiment analysis helps to determine the tone ...
PostgreSQL
Creating PostgreSQL Dataset To create a PostgreSQL dataset, follow the steps given below. On the home page, click the Create icon (). The Product Selection page is displayed. Hover over the Data Connect tile and click Create Dataset. The following ...