Refreshing Metadata of Dataset

Refreshing Metadata of Dataset

Rubiscape supports the Metadata Refresh feature for datasets, as listed below.

  • Google Spreadsheet dataset
  • RDBMS dataset
  • File type dataset from AWS S3 cloud storage
  • Applicable to all flat files (Excel, CSV, JSON, Text)

This feature is available in both the workbook and workflow canvas to update the Reader when a dataset is modified.

Consider a scenario in which a Google Spreadsheet dataset created in Rubiscape is used across different workbooks and workflows. Let us consider that its source file, which is on the Cloud, is modified at some point. Hence the dataset, which uses the source file, is also updated to display the modified columns. But the modified columns are not updated in the existing Reader used across the workbooks and workflows. To get the expected result, the Reader connected to the dataset should reflect the updated columns in Rubiscape. But it is inconvenient to delete, drag-and-drop, and reconnect the Reader node to refresh it each time its corresponding dataset is modified.

The Metadata Refresh feature is added to save time and effort and keep the existing algorithm flow undisturbed. It updates the Reader at the click of a button from the workbook or workflow canvas you are working on. The Metadata Refresh feature allows you to access the features from the modified dataset and update the selected Reader accordingly.

In Metadata Refresh, the access to fetch features for a Reader is restricted to its corresponding dataset.

In Rubiscape, a dataset can access columns from the following source files during dataset creation:

  • Files stored on your computer
  • Files on the AWS S3 cloud storage
  • File stored online

Metadata Refresh is only applicable to the Readers that have their source files stored online or on the AWS S3 cloud.

Metadata Refresh updates the Reader with the modifications done in the source file and the corresponding dataset. The modifications can be one or more of the following:

  • Addition of one or more features
  • Deletion of one or more features
  • Modification of one or more features (rename features, change in variable type)

Notes:

  • After updating the Reader using Metadata Refresh, the updated columns are displayed under the Data Fields drop-down in the Properties pane.
  • To view the modified data of the Reader from the workbook or workflow canvas, you can explore the Reader after its successful execution.
  • When you click the Metadata refresh button, the data fields are refreshed, and the list is renewed.
  • Suppose there is a workflow consisting of functionalities or algorithms below the reader node.
  • Hence, when you click Validate, you may get validation errors like the ones given below.
    • Input variables of the task are not available in the predecessor. Please make the required changes.
    • Data type of the reader is changed. Please make the necessary changes.
    • In this case you need to re-configure the algorithms and run them again.

Refreshing Metadata of Google Spreadsheet Dataset

Consider the Google_Spreadsheet dataset.

To perform Metadata Refresh on a Google Spreadsheet dataset, follow the steps given below.

  1. Modify the JSON source file (containing the private key) used in creating the Google Spreadsheet
    In this example, we add two new columns in the JSON source file to perform modifications.
  2. Perform steps 1 to 4 of Editing a Dataset to edit the Google Spreadsheet.
  3. Click Fetch.
    The modified columns are retrieved from the source file and displayed in the Features box.
  4. Click Update.
    The Google Spreadsheet dataset is updated, and a confirmation message is displayed.

    You are returned to the Rubiscape home page.
  5. Open a workbook or workflow which contains the Google Spreadsheet. Refer to Opening a Workbook and Opening a Workflow.
  6. Click the Reader on the canvas.
    The Properties pane is displayed. The original columns in the dataset are displayed in the Data Fields drop-down.


  7. Click Metadata Refresh.
    The modified columns are displayed in the Data Fields drop-down.




    The Reader is updated with the modified columns.

    In this example, Total Profit and Units Sold columns were added to the dataset.

Refreshing Metadata of RDBMS Dataset

Consider the SQL12 dataset with three columns Sr No#, Emp Code, and Branch Code.

To perform Metadata Refresh on an RDBMS dataset, follow the steps given below.

  1. Modify the source file on the SQL server used in creating the RDBMS SQL In this example, we delete a column from the source file to perform modifications.
  2. Perform steps 1 to 4 of Editing a Dataset to edit the SQL.
  3. Click the Refresh icon next to the RDBMS table name to update the features in the RDBMS table.
    The modified columns are retrieved from the source file and displayed in the Features box.
  4. Click Update.
    The SQL dataset is updated, and a confirmation message is displayed.


    You are returned to the Rubiscape home page.

  5. Open a workbook or workflow, which contains the RDBMS SQL. Refer to Opening a Workbook and Opening a Workflow.
  6. Click the Reader on the canvas.
    The Properties pane is displayed. The original columns in the dataset are displayed in the Data Fields drop-down.


  7. Click Metadata Refresh.
    The modified columns are displayed in the Data Fields drop-down.


    The Reader is updated with the modified columns.
    In this example, the Branch Code column was deleted from the dataset.

Refreshing Metadata of File Type Dataset

Consider the Superstore dataset.

To perform Metadata Refresh on a File type dataset, follow the steps given below.

  1. Modify the source file on the AWS S3 cloud storage used in creating the CSV.
    In this example, we add a new column in the source file to perform modifications.
  2. Perform steps 1 to 4 of Editing a Dataset to edit the CSV
  3. Click Metadata Refresh.
    The modified columns are retrieved from the source file and displayed in the Features box.
  4. Click Update.
    The CSV dataset is updated, and a confirmation message is displayed.


    You are returned to the Rubiscape home page.

  5. Open a workbook or workflow, which contains the CSV Opening a Workbook and Opening a Workflow.
  6. Click the Reader on the canvas.
    The Properties pane is displayed. The original columns in the dataset are displayed in the Data Fields drop-down.


  7. Click Metadata Refresh.
    The modified columns are displayed in the Data Fields drop-down.

    The Reader is updated with the modified columns.

    In this example, the Quarter column was added to the dataset.


    • Related Articles

    • Refreshing Dashboard using Scheduled Workflow

      The dashboards that you create are static, meaning the dashboard views do not change if the data they represent is changed. For this, you can use the refresh function provided by RubiSight. Refer to Refreshing a Dashboard. However, there can be ...
    • Refreshing a Dashboard

      Refreshing a dashboard helps you fetch the latest data and display the most current views. To refresh a dashboard, follow the steps given below. Open the Workspace that includes your dashboard. Refer to Changing Workspace. On the home page, click ...
    • Editing a Dataset

      After you add or import a dataset, you can edit it. For adding or importing a dataset, refer to Adding a Dataset or Importing a Dataset. In Editing Dataset, you can, Edit the name and description of the dataset Select, remove, or modify the features ...
    • Adding a Dataset

      A dataset is global and shared across the same workspace. Consider adding a dataset before creating a project. You can add a dataset from the supported data sources. The added datasets can be used in multiple projects. To add a dataset, follow the ...
    • Importing a Dataset

      You can import a previously exported dataset and use it in your projects. Note: Dataset can be imported as a .DAT file only. To import a dataset, follow the steps given below. On the home page, click Datasets. Recent Datasets for the current ...