Databricks

Databricks

Creating Databricks Dataset

To create a Databricks dataset, follow the steps given below.

  1. On the home page, click the Create icon (). The Product Selection page is displayed.
  2. Hover over the Data Connect tile and click Create Dataset.
    The following figure shows the Product Selection
     page.



    The Data Connect page for choosing your dataset type is displayed.
  3. From the RDBMS options, select Databricks.
    The following figure shows RDBMS options.


    The Create Databricks Dataset page is displayed.


  4. Enter a suitable Name for the Dataset.
  5. In the Connection Parameter section, enter the following details.

    -- Host (IP address of the server where your database resides)

    -- Access Key (A unique security token or password used to authenticate and allow access to a system, API, or service.)

    -- Http Path (A secure web address (URL) that uses HTTPS protocol to safely send and receive data.)

    -- Catalog (A structured collection or list of items—such as data, products, or tables—organized for easy access and reference.)

    -- Schema (name to access schema(s) and the table(s)

  6. Click Test Connection. The message "Database Connection Successful" is displayed in green if the parameters are accurate. After a successful connection, the Schema dropdown is populated with a list of available schema's

  7. From the Schema dropdown, select the schema(s) that contains your table(s) and click Done. After selecting the schema(s), the Select Tables dropdown is populated with a list of all available tables in the schema(s).
  8. From the Select Tables dropdown, select the required tables and click Done.

  9. To add a custom query to select data from the tables, click Add Custom QueryAdd Custom Query screen is displayed. Refer to Adding A Custom Query for details.

  10. Click Create. The Databricks dataset is created in Rubiscape and is available for use in your workbooks and pipelines.
    The following figure shows the Create Databricks Dataset page.

Notes:

  • You can select multiple schemas from the available options in the Schema dropdown.

  • You can select multiple tables from the available options in the Select Tables dropdown.

  • Enabling the "Disable Cache" option allows you to create a dataset without generating a dataset cache.
  • When you select to "Disable Cache", the dashboard will not offer the "Enable Direct Query" option. For more information, please refer to the "Enable Direct Query" document.