Features Used to Upload Datasets

Features Used to Upload Datasets

Filename/Wildcard

There are two ways to access the files present on the server.

  1. Star – Using '{}{*}' you can access all the files. For example, using *.xlsx, you can match all the datasets with this extension. Hence, all of these datasets will be verified.

  2. Question mark – Using a single '?' you can access the dataset files with one different character. For example, IRIS_?.xlsx will enable you to access files like IRIS_2.xlsx, IRIS_3.xlsx, and so on.

 
Note:

All the files are expected to have the same number of data columns. However, if this is not the case, the missing columns in the files are appended with NaN values.

Traverse Subdirectory

This feature enables you to check for available files in the underlying subdirectories in addition to the selected root folder. It traverses subdirectories up to the nth level.

 Delimiter

  • In the text files, Delimiters are used to separate the different independent regions in the data streams.
  • For example, ';' acts as a delimiter in a series of semicolon-separated values.
  • If no delimiter is passed, these regions will be merged as one.


Notes:

  • In the case of text files, you cannot use *.txt to access all the text files if the files on the FTP server are not separated by the same Delimiter
  • You can use *.txt only in a situation where all the files present on the FTP server are separated by the same delimiter.
  • For JSON files, the wildcard option is not provided. Instead, you can enter the entire file URL to access the respective data files from the FTP server.
  • This makes accessing and selecting files from the server an easy task for the user.
  • You can also upload a single file instead of choosing multiple different files. You can do this for all the file types; Excel, JSON, CSV, and .txt.
  • The filename for all the files must be unique.
    • Related Articles

    • Datasets

      A dataset is a compilation or collection of data, usually in tabular form. However, non-tabular datasets can also be compiled, as in the case of an XML file, where data appears in the form of marked-up strings of characters. In the case of Datasets, ...
    • Managing Datasets in Canvas

      You can manage the datasets you have used in the data dictionary canvas. You can, Create a calculated field Refresh Metadata. Refer Refreshing Metadata of Dataset. Remove a dataset from the canvas View dataset columns Creating Calculated Field You ...
    • Managing Datasets

      What is Reader In rubiscape, a reader is referred to as a dataset. Dataset is a collection of elements extracted from different sources that can be integrated into one. The datasets added can be shared across different Projects. They are used to ...
    • Features of AutoML Wizard

      The Auto ML Wizard recommends the best fit model for a selected dataset, This recommendation is especially helpful if you do not know which algorithm to use for the selected dataset. Features: Auto-creation of a workbook Import, search or choose a ...
    • Types of Datasets

      Rubiscape supports a wide range of datasets that can be used to perform analysis. Availability of multiple types of datasets, makes sure that there are no limitations on what type of data you can use. The figure given below displays the types of data ...