Rubiscape supports Parquet files (.parquet) for dataset creation.
To create a Parquet dataset, follow the steps given below.
On the home page, click Create icon . 
The Product Selection page is displayed.
Hover over the Data Connect tile and click Create Dataset.
Dataset Selection page is displayed.
From the File options, select Parquet.
Create Parquet Dataset page is displayed.
You can use a Parquet file stored on your computer or from the AWS S3 storage network to create a Parquet dataset.
The dataset creation for these types is explained in the sections below.
To create a Parquet dataset by uploading a Parquet file from your computer, follow the steps given below.
Follow steps 1 to 3 of above.
Enter Name and Description for your dataset.
Click Source of Data dropdown to select the desired source options. Available options are S3, FTP, Azure Blob Storage, MinIO, Rubiscape One Drive .
Click Browse. The File Browser window is displayed.
Browse to your file location and select a Parquet file. The Features (columns) in the Parquet worksheet are displayed in the Features box.
To change the datatype of the features, refer to Configuring Feature Type.
If you wish to remove any of the features, hover over the feature name, and click the Close icon 
Click Create.
A confirmation message is displayed.
The Parquet dataset is created in the current workspace and available in workbooks, pipelines and dashboards.
|
|
To create a Parquet dataset by uploading a Parquet file from the AWS S3 bucket storage, follow the steps given below.
Follow steps 1 to 3 of above.
Enter Name and Description for your dataset.
To upload a file from AWS storage, select the S3 option from Source of Data dropdown.
A new set of dataset creation options are displayed.
Enter the following details for the cloud storage.
Bucket Name
Aws Access Key Id
Aws Secret Access Key
File Directory URL (for the folder created by you on the S3 browser)
Filename or Wildcard
|
|
To validate the connection parameters, click Verify.
If the parameters are valid, a Verification Success message is displayed. Also, the Show Filename(s) button gets activated.
To see the files detected, click Show Filename(s).
The list of detected files is displayed in a separate window.
Also, the features (columns) in the Parquet worksheet are displayed in the Features box.
To change the datatype of the features, refer to Configuring Feature Type.
If you wish to remove any of the features, hover over the feature name, and click the Close icon ().
To insert additional features (along with the features already present in the dataset), click Additional Output Features.
Create Parquet Dataset window is displayed. It displays the following features, which can be inserted along with the existing ones.
Full File Name
Short File Name
File Path
File Extension
|
|
Select the checkboxes corresponding to those features that you want to insert and click Done.
Click Create.
Notes: |
|
Notes |
|