Rubiscape supports the Metadata Refresh feature for datasets, as listed below.
- Google Spreadsheet dataset
- RDBMS dataset
- File type dataset from AWS S3 cloud storage
- Applicable to all flat files (Excel, CSV, JSON, Text)
This feature is available in both the workbook and workflow canvas to update the Reader when a dataset is modified.
Consider a scenario in which a Google Spreadsheet dataset created in Rubiscape is used across different workbooks and workflows. Let us consider that its source file, which is on the Cloud, is modified at some point. Hence the dataset, which uses the source file, is also updated to display the modified columns. But the modified columns are not updated in the existing Reader used across the workbooks and workflows. To get the expected result, the Reader connected to the dataset should reflect the updated columns in Rubiscape. But it is inconvenient to delete, drag-and-drop, and reconnect the Reader node to refresh it each time its corresponding dataset is modified.
The Metadata Refresh feature is added to save time and effort and keep the existing algorithm flow undisturbed. It updates the Reader at the click of a button from the workbook or workflow canvas you are working on. The Metadata Refresh feature allows you to access the features from the modified dataset and update the selected Reader accordingly.
In Metadata Refresh, the access to fetch features for a Reader is restricted to its corresponding dataset.
In Rubiscape, a dataset can access columns from the following source files during dataset creation:
- Files stored on your computer
- Files on the AWS S3 cloud storage
- File stored online
Metadata Refresh is only applicable to the Readers that have their source files stored online or on the AWS S3 cloud.
Metadata Refresh updates the Reader with the modifications done in the source file and the corresponding dataset. The modifications can be one or more of the following:
- Addition of one or more features
- Deletion of one or more features
- Modification of one or more features (rename features, change in variable type)
| - After updating the Reader using Metadata Refresh, the updated columns are displayed under the Data Fields drop-down in the Properties pane.
- To view the modified data of the Reader from the workbook or workflow canvas, you can explore the Reader after its successful execution.
- When you click the Metadata refresh button, the data fields are refreshed, and the list is renewed.
- Suppose there is a workflow consisting of functionalities or algorithms below the reader node.
- Hence, when you click Validate, you may get validation errors like the ones given below.
- Input variables of the task are not available in the predecessor. Please make the required changes.
- Data type of the reader is changed. Please make the necessary changes.
- In this case you need to re-configure the algorithms and run them again.
|
Consider the Google_Spreadsheet dataset.
To perform Metadata Refresh on a Google Spreadsheet dataset, follow the steps given below.
- Modify the JSON source file (containing the private key) used in creating the Google Spreadsheet
In this example, we add two new columns in the JSON source file to perform modifications.
- Perform steps 1 to 4 of Editing a Dataset to edit the Google Spreadsheet.
- Click Fetch.
The modified columns are retrieved from the source file and displayed in the Features box. - Click Update.
The Google Spreadsheet dataset is updated, and a confirmation message is displayed.
You are returned to the Rubiscape home page. - Open a workbook or workflow which contains the Google Spreadsheet. Refer to Opening a Workbook and Opening a Workflow.
- Click the Reader on the canvas.
The Properties pane is displayed. The original columns in the dataset are displayed in the Data Fields drop-down.
- Click Metadata Refresh.
The modified columns are displayed in the Data Fields drop-down.
The Reader is updated with the modified columns.
In this example, Total Profit and Units Sold columns were added to the dataset.
Consider the SQL12 dataset with three columns Sr No#, Emp Code, and Branch Code.
To perform Metadata Refresh on an RDBMS dataset, follow the steps given below.
- Modify the source file on the SQL server used in creating the RDBMS SQL In this example, we delete a column from the source file to perform modifications.
- Perform steps 1 to 4 of Editing a Dataset to edit the SQL.
- Click the Refresh icon next to the RDBMS table name to update the features in the RDBMS table.
The modified columns are retrieved from the source file and displayed in the Features box. - Click Update.
The SQL dataset is updated, and a confirmation message is displayed.
You are returned to the Rubiscape home page.
- Open a workbook or workflow, which contains the RDBMS SQL. Refer to Opening a Workbook and Opening a Workflow.
- Click the Reader on the canvas.
The Properties pane is displayed. The original columns in the dataset are displayed in the Data Fields drop-down.
- Click Metadata Refresh.
The modified columns are displayed in the Data Fields drop-down.
The Reader is updated with the modified columns.
In this example, the Branch Code column was deleted from the dataset.
Consider the Superstore dataset.
To perform Metadata Refresh on a File type dataset, follow the steps given below.
- Modify the source file on the AWS S3 cloud storage used in creating the CSV.
In this example, we add a new column in the source file to perform modifications.
- Perform steps 1 to 4 of Editing a Dataset to edit the CSV
- Click Metadata Refresh.
The modified columns are retrieved from the source file and displayed in the Features box. - Click Update.
The CSV dataset is updated, and a confirmation message is displayed.
You are returned to the Rubiscape home page.
- Open a workbook or workflow, which contains the CSV Opening a Workbook and Opening a Workflow.
- Click the Reader on the canvas.
The Properties pane is displayed. The original columns in the dataset are displayed in the Data Fields drop-down.
- Click Metadata Refresh.
The modified columns are displayed in the Data Fields drop-down.
The Reader is updated with the modified columns.
In this example, the Quarter column was added to the dataset.