Excel

Excel

Creating Excel Dataset

Rubiscape supports Excel files (.xls and .xlsx) for dataset creation.
To create an Excel dataset using any of the modes, follow the basic steps given below.

  1. On the home page, click Create icon ().
    The Product Selection page is displayed.
  2. Hover over the Data Connect tile and click Create Dataset.



    Dataset Selection page is displayed.
  3. From the File option, Select Excel.


    Create Excel Dataset page is displayed. You can use an Excel file stored on your computer or from the AWS S3 storage network to create an Excel dataset. The dataset creation for these types is explained in the sections below.

Creating Excel Dataset by Uploading Excel File

To create an Excel dataset by uploading an Excel file from your computer, follow the steps given below.

  1. Follow steps 1 to 3 of Creating Excel Dataset.
  2. Enter Name and Description for your dataset.
  3. To upload an Excel file from your computer, select the Upload File  in Source of Data dropdown .
  4. Click Browse.
    The File Browser window is displayed.
  5. Browse to your file location and select an Excel file.
  6. Click the Excel Worksheet drop-down.
  7. From the available worksheets in the selected Excel file, select the worksheet you want to import.
    The Features (columns) in the Excel worksheet are displayed in the Features box.
  8. To change the variable type of the features, refer to Configuring Feature Type.

  9. If you wish to remove any of the features, hover over the feature name, and click the Close icon ().



    A list of available worksheets in the selected Excel file is displayed.

  10. From the list, select the worksheet you want to import.

  11. If required, you can select the encoding format of your Excel file from the Select Encoding drop-down.
    This is an optional step. The default selection is None. For the supported encodings, refer to File Encoding Formats.

  12. Click Create.



A confirmation message is displayed. The Excel dataset is created in the current workspace and is available for use in workbooks, workflows, and dashboards.

Notes

  • Enabling the "Disable Cache" option allows you to create a dataset without generating a dataset cache.
  • When you select to "Disable Cache", the dashboard will not offer the "Enable Direct Query" option. For more information, please refer to the "Enable Direct Query" document.

Creating Excel Dataset using S3 Bucket Storage

To create an Excel dataset by uploading an Excel file from the AWS S3 bucket storage, follow the steps given below.

  1. Follow steps 1 to 3 of Creating Excel Dataset.
  2. Enter Name and Description for your dataset.
  3. To upload a file from AWS storage, select the S3 radio button.
    A new set of dataset creation options are displayed.
  4. Enter the following details for the cloud storage.
    • Bucket Name
    • Aws Access Key Id
    • Aws Secret Access Key
    • File Directory URL (for the folder created by you on the S3 browser)
    • Filename or Wildcard


Notes:

  • By default, the number in the Skip Rows field is zero (0). You can enter an integer value in the field corresponding to the number of rows in the dataset you wish to delete.
  • For example, if you enter 15 in the Skip Rows field, the first 15 rows will be skipped/deleted while creating the dataset.
  • The administrator provides Aws Access Key Id and Aws Secret Access Key.
  • If a file is already present in the root directory, you can access the file from the folder using a slash (/) symbol.
  • If a folder is already present, you can give its path in the File Directory URL field.
  • The File Directory URL is for the folder created by you on the S3 browser. This folder contains the dataset files whose Filename/Wildcard is mentioned in the next field.
  • You can use special characters/symbols to search file names like an asterisk (*) and question mark (?).
  • An asterisk (*) symbol searches file names with multiple (any number) characters in the specified place.
  • For example, a filename with Data_*_* searches all file names containing multiple characters between the underscore marks and after the last underscore mark.
  • question mark (?) is used to search file names with a single character in the specified place.
  • For example, a filename with Data_? _??? searches all file names containing one character between the underscore marks and three characters after the last underscore mark.
  • Hence, a filename with Data_??_* searches all file names containing two characters between the underscore marks and multiple characters after the last underscore mark.
  • You can search for the dataset files in all folders and sub-folders of the root directory by selecting the Traverse Subdirectory checkbox.

To continue the Excel Dataset creation process,

  1. To validate the connection parameters, click Verify.
    If the parameters are valid, a Verification Success message is displayed. Also, the Show Filename(s) button gets activated.
  2. To see the files detected, click Show Filename(s).
  3. The list of detected files is displayed in a separate window.



  4. Click the Excel Worksheet drop-down.
  5. From the available worksheets in the selected Excel file, select the worksheet you want to import.
    The Features (columns) in the Excel worksheet are displayed in the Features box.
  6. To change the datatype of the features, refer to Configuring Feature Type.
  7. If you wish to remove any of the features, hover over the feature name, and click the Close icon ().


  8. To insert additional features (along with the features already present in the dataset), click Additional Output Features.
    Create Excel Dataset window is displayed. It displays the features that can be inserted along with the existing ones. These are Full File Name, Short File Name, File Path, File Extension, File Size, and File Last Modified Date.
    Click Done.

  9. Click Create.



     
    Notes:
    • The Additional Output Features are not selected by default.
    • You can choose anyone, multiple, or all of them to be inserted into the dataset





    Notes:

    • Make sure to use a standard date format for datasets containing date as a variable.
    • In case the dataset contains geographical variables, the Map Locations page is displayed. Verify that all locations are mapped and click Done to create the dataset. Refer to Creating a Dataset with Geographical Variable Type.
    • Enabling the "Disable Cache" option allows you to create a dataset without generating a dataset cache.
    • When you select to "Disable Cache", the dashboard will not offer the "Enable Direct Query" option. For more information, please refer to the "Enable Direct Query" document.


    The Excel dataset is created in the current workspace and is available for use in workbooks, workflows.

    • Related Articles

    • Skip Rows

      While creating a dataset, sometimes the first few rows contain the meta and other related information. This meta might not be required in the actual pre-processing and further analysis. In Rubiscape, you can remove this meta and other unwanted first ...
    • File Path

      Note The File Path option is disabled for use in SAAS. It will work in Desktop application only. Rubiscape provides you six options to select the source to create a dataset. The available options are: Upload File S3 FTP Azure Blob Storage File Path ...
    • Rubiscape File Server

      The Rubiscape File Server lets you upload files/folders on cloud storage for the current workspace. You can use these files while creating a dataset or working with RubiNotebook or RubiPython. Rubiscape provides you six options to select the source ...
    • Features of AutoML Wizard

      The Auto ML Wizard recommends the best fit model for a selected dataset, This recommendation is especially helpful if you do not know which algorithm to use for the selected dataset. Features: Auto-creation of a workbook Import, search or choose a ...
    • MinIO

      MinIO is a high-performance, distributed object storage system. In Rubiscape, you have the capability to access files stored on MinIO storage to create datasets for analysis and processing. By configuring the necessary parameters, you can establish ...