Datasets

Datasets

A dataset is a compilation or collection of data, usually in tabular form. However, non-tabular datasets can also be compiled, as in the case of an XML file, where data appears in the form of marked-up strings of characters.
In the case of Datasets, you can

  • Search, Add, Edit, Export, and Delete a Dataset

  • Add Users and Groups to the Dataset

  • Edit Access Control of the Dataset

(info)
Notes:
  • Hover over any dataset and click on the dataset. You are navigated to the edit dataset page. In the top-right corner you see the following buttons:

    • Export Dataset

    • Edit Dataset

    • Edit Access Control

  • You can use these functionalities as required.

The image below shows the contents below the Datasets tab.



The following table explains various fields on the Datasets tab.

Field/Icon

Description

Datasets KPI Card

  • Gives the count of the current number of datasets in the Tenant

  • Click to see the contents in the Datasets List tab

Datasets (Number)

Projects List Tab

  • View a list and number of all datasets present in the selected Workspace of the Tenant

  • Details for each Dataset:

    • Name

    • Description

    • The name of the dataset creator

    • Date and Time of creation

Workspace Drop-down

Select a workspace whose datasets you want to see in the list.

Search Dataset

  • Click to search for any dataset.

  • Partial names are allowed for searching.

  • A list of dataset names matching the search string is populated as you start typing.

Add New Dataset

  • Click to add a new dataset

  • Navigates you to the RubiConnect page

  • Create the following types of datasets:

  • Social Media

  • RDBMS

  • File

  • Hadoop

  • API

  • Email

Menu Icon

  • Click to

  • Edit a dataset

  • Edit Access Control to a dataset

  • Export a dataset

  • Delete a dataset

RubiConnect

RubiConnect is one of the first products you come across on the Rubiscape platform. It provides seamless, direct, and secure access to multiple data sources with native interfaces and integration standards to source data. No custom coding or expertise in SQL or other query languages is required.
RubiConnect is tightly integrated and used by all Rubiscape solutions for third-party data integration. It helps build enormous reservoirs for big data and wants to deep dive into diverse data to come out with innovation and insights.
RubiConnect is needed for

  • Connectivity Gateway

  • Data & Workflow Designs

  • Data Integration

  • Metadata Management

  • Alerts & Rule Engines

The supported integration standards in RubiConnect include

  • ODBC

  • JDBC and

  • OLE DB

(info)
Note

Users can create datasets from RubiConnect page. The created datasets get added to the respective workspace selected in the dropdown.

Types of Datasets

In machine learning, data variables are mostly categorized into the following types.

Numerical Data

Categorical Data

Interval data

Textual Data

Geographical Data

The data types and corresponding datasets supported in Rubiscape are given below.

Data Types

Datasets

Social media

  • Twitter

  • RSS

  • Facebook

Hadoop

  • HDFS

  • Hive

  • HBase

  • Impala

RDBMS

  • PostgreSQL

  • SQL

  • MySQL

  • Oracle

  • ODBC

  • SSAS

API

  • Google News

  • Video Stream

  • Google Spreadsheet

  • Google Big Query

File

  • Excel

  • CSV

  • Text

  • JSON

Email

Email



As shown above, Rubiscape supports various data sources under each dataset type.

Adding a New Dataset

Datasets are created at the workspace level and hence are consumed in any of the entities in any of the projects in the Rubiscape module.
To add a dataset,

  1. In the Dataset tab, select the Workspace to which you want to add the Dataset.

  2. Click Add New Dataset on the extreme right. You are navigated to the RubiConnect page for dataset selection.

  3. From the available options, select and click the required dataset type. A window to create the Dataset is displayed.

  4. Fill in the required details. Refer to Connect for all types of datasets. After creating the Dataset, you are navigated back to the Dataset tab on the Admin Module's Content page.

Searching a Dataset

Searching a dataset from a long list can be time-consuming. The Search Dataset field helps to search your desired Dataset quickly.
To search a Dataset,

  1. In the Dataset tab, click inside the Search Dataset field.

  2. Type the name of the project you want to search.

Partial names are allowed for searching. A list of project names matching the search string is populated as you start typing.

Editing a Dataset

After you add a dataset, you can edit it.
To edit a dataset,

  1. Hover over the Dataset you want to edit,

  • Click anywhere on the highlighted row and click Edit Dataset on the next page.

  • Click on the menu icon () and click Edit.

You are navigated to the Update Dataset page.

  1. Make changes to various fields as required.

  2. Click Update. The updated Dataset appears in the datasets list on the Contents page.

Adding Users and Groups To Dataset

As an administrator, you can add users and groups to access a dataset. You can identify a user or a group from their icons.

User

Group

To add a user or a group,

  1. In the Dataset tab, search and navigate to the required Workspace.

  2. Search and navigate to the Dataset to which you want to add users and groups.

  3. Click anywhere on the highlighted row and click Edit Dataset on the next page. Alternatively, click on the menu icon () and click Edit Access Control.

  4. Click Add Users/Groups on the extreme right in the Access Control section. The Add Users/Groups page is displayed.

  5. Click the Search/Select drop-down.

  6. To select the users and groups,

    1. Scroll down the list and add users or groups as required by selecting the corresponding checkboxes or

    2. Type the names of users or groups you want to select in the search field or

    3. Select the Select All check box for all users and groups.

  7. After you have selected the required users and groups, click Add. The added users and groups list is displayed in the Access Control section.

  8. Click Save in the top-right corner. The selected users and groups are added to the Dataset.

Providing Access to Dataset

As an administrator, you can control a user's (or group's) access to a dataset. You can give access to modify, view, and delete a dataset. Also, you can deny any of these accesses to users or groups.

Access

Users/Groups can

Additionally, Users/Groups can

View

Only view a dataset

Modify

Modify a dataset

View a dataset

Delete

Delete a dataset

View and Modify a dataset

Deny

NOT modify, view, or delete a dataset

To provide access,

  1. Follow steps 1 to 6 in Adding Users and Groups To Dataset.

  2. Select the access permission checkboxes for the controls you want to provide.

  3. Alternatively, to provide identical access controls to all the selected users and groups, select the access permission checkboxes directly on the Add Users/Groups page.
  4. Click Save in the top-right corner. The access controls are applied to the selected users and groups.

(info)
Notes:

In the Access Control section, you can identify a

  • User by the email address in the Email column

  • Group by name in the Email column (a group cannot have an email address)

Removing Access Control For Dataset

As an administrator, you can control a user's or group's access to a dataset. You can remove the access to modify, view, and delete a dataset.

To remove access,

  1. In the Dataset tab, select the Workspace to which your Dataset belongs.

  2. Search and navigate to the required Dataset.

  3. Click anywhere on the highlighted row and click Edit Access Control on the next page. Alternatively, click on the menu icon () and click Edit Access Control.

  4. As required, in the Access Control section, clear the checkboxes for Modify, View, and Delete options for the user/group. The individual user's or group's access permissions are removed.

  5. If you want to remove a user or group from the list, hover over the user/group and click the Delete icon () on the extreme right.

  6. Click Save in the top-right corner. The modified selections are saved for the Dataset.

Exporting a Dataset

You can export the Dataset to save it in your system. You can use the exported Dataset again by importing it into Rubiscape.
To export a dataset,

  1. In the Dataset tab, select the Workspace to which your project belongs.

  2. Search and navigate to the required Dataset.

Tip

  • You might be prompted to select the location based on your browser settings.

  • Select the destination folder, and then click Save.

    Click anywhere on the highlighted row and click Export on the next page. Alternatively, click on the menu icon () and click Export. The Dataset is saved to your default download folder.

Deleting A Dataset

As an administrator, you can delete a project that is no longer required.

To delete a project,

  1. In the Dataset tab, select the Workspace to which your Dataset belongs.

  2. Search and navigate to the required Dataset.

  3. Hover over the Dataset you want to delete, click on the menu icon () and click DeleteA confirmation message to delete the Dataset pops up.

  4. Click Delete on the message. The Dataset is deleted, and a confirmation message is displayed.

    • Related Articles

    • Managing Datasets

      What is Reader In rubiscape, a reader is referred to as a dataset. Dataset is a collection of elements extracted from different sources that can be integrated into one. The datasets added can be shared across different Projects. They are used to ...
    • Types of Datasets

      Rubiscape supports a wide range of datasets that can be used to perform analysis. Availability of multiple types of datasets, makes sure that there are no limitations on what type of data you can use. The figure given below displays the types of data ...
    • Exploring Datasets in RubiSight

      Exploring datasets gives you an idea about the various data types present and the widgets that can be used to plot them. To explore a dataset, follow the steps given below. Open the Dashboard in edit mode. Refer to Editing a Dashboard. The Dashboard ...
    • Managing Datasets in Canvas

      You can manage the datasets you have used in the data dictionary canvas. You can, Create a calculated field Refresh Metadata. Refer Refreshing Metadata of Dataset. Remove a dataset from the canvas View dataset columns Creating Calculated Field You ...
    • Features Used to Upload Datasets

      Filename/Wildcard There are two ways to access the files present on the server. Star – Using '{}{*}' you can access all the files. For example, using *.xlsx, you can match all the datasets with this extension. Hence, all of these datasets will be ...