Batch Processing

Batch Processing

Working with Batches

Workflow in Data Integrator allows you to divide the dataset into batches and then process it. Batch processing is mainly used to simplify many ETL operations like Missing value Imputation, expression, and validating data. You can specify the batch size called Chunk. Let's take a look at batch processing step by step

Batch Processing Steps

  • Click on three dots () on the right-hand corner of the Data Integrator landing page. The batch processing option is displayed.
  • Click on batch processing option, and the following popup is displayed


  • The default chunk size is 10000. If you remove it and change it to zero, you will get an error message. The batches are created depending on the chunk size.
  • The data page will be displayed as follows



  • The records are processed as per mentioned batch size. When you explore the output, you will see all the processed data together. You can view maximum 10000 records per page.
  • The pagination is applied when the dataset contains more than 1000 records.

Advantages

  • Processing becomes faster. It improves the speed of many ETL operations like Missing value Imputation, expression, Cleansing, and model testing.
  • The parallel processing becomes faster. Consider the following workflow.
    • Dataset reads the data from the file.
    • You build an expression and save it in the output file in one process.
    • In another process you build the Model.
    • When you apply batch processing, both processes run concurrently.
    • Following figure explains the parallel processing.

  • You are allowed to explore the model when it is processing.

Limitations

  • Apply Batch processing only on Workflow.
  • Don't apply Batch processing in the case of entire column operations like average, and totals.
  • Batch processing is not allowed on the datasets generated using the following techniques
    • SSAS RDBMS
    • Twitter
    • JSON file format
    • Google news

    • Related Articles

    • View and Edit Mode in Workbook/Workflow

      Rubiscape allows you to access the Workbook/Workflow in two modes: View and Edit. View mode is available for anyone accessing a workbook/workflow. You can switch modes by selecting the mode option in the Function pane. The "Access Log" option within ...
    • Understanding WorkFlow Canvas

      The workflow canvas is the area where you can build algorithm flows. When you open a workflow, the following icons and fields are displayed. The workflow screen has four panes as given below. Task Pane: This pane displays the datasets and algorithms ...
    • Sentiment

      Sentiment analysis is done using algorithms that use text analysis and natural language processing to classify words as either positive, negative, or neutral. It is done using Positive or Negative Lexicons. A sentiment score is derived depending on ...
    • LSTM

      LSTM is located under Forecasting in Modeling, in the task pane on the left. Use drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to Properties of LSTM. Properties of ...
    • Custom Word Remover

      Custom Words Remover is located under Textual Analysis ( ) in Pre Processing, in the task pane on the left. Use drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for analysis. One of the ...