Aggregation

Aggregation

Aggregation

Description

Aggregation of categorical data involves the gathering of information for statistical analysis and expressing it in a summarized form.

Why to use

Numerical Analysis – Data Preparation

When to use

When you want to collect specific information about particular groups based on specific variables.

When not to use

On textual data.


Prerequisites

It should be used on numerical and categorical data.

Input

Any dataset that contains categorical as well as numerical data.

Output

Aggregated numerical or categorical data.

Statistical Methods used

  • Sum
  • Mean
  • Mode
  • Minimum
  • Maximum
  • Count
  • Count (Distinct)
  • Standard Deviation
  • Variance

Limitations

Sometimes using only aggregation is not enough as it gives only single level analysis. You may need to use other methods to get accurate results.


Aggregation is a group-by algorithm in which a given data is grouped for a certain categorical data variable like name, date, color, educational level and so on. The data that is grouped is the numerical data and is called the Aggregate Function. You can use this algorithm without selecting the GroupBy function.
    • Related Articles

    • Aggregation

      Aggregation Description Aggregation of categorical data involves the gathering of information for statistical analysis and expressing it in a summarized form. Why to use Numerical Analysis – Data Preparation When to use When you want to collect ...
    • Rubiscape Winter '21

      New Features Platform & Studio Rubidesign – Data dictionary enhancements Redesigned UI for improved user experience Rubiconnect – Ability to build and manage Dataset connections (formerly done under Dataset menu) All readers and writers will be ...
    • Rubiscape Autumn '24

      Published On: 23 August 2024 New Features RubiFlow Column level lineage table for workflow: Feature to retrieve column level lineage from the writer dataset Set variables value while scheduling and pass on that schedule execution: Option for ...
    • Random Forest

      Random Forest Description Random Forest is a Supervised Machine Learning algorithm. It works on the Bagging (Bootstrap Aggregation) principle of the Ensemble technique. Thus, it uses multiple models instead of a single model to make predictions. It ...
    • Lag

      Lag Description Lag, also called the sliding window method, is a backshift operator function. The window width is the number of places the data points are shifted. Why to use Lag is used to create a new list of data points by shifting them by an ...