Custom Functions in RubiPython

Custom Functions in RubiPython

Reading Data from File

A sample Python code to read data from file is shown in the image below.

The table below explains the above code snippet.

Line of Code

Result

getReaderData(“datasetName”,“subdatasetName”)

This custom function checks the type of the dataset (Excel, CSV, and Text) and accordingly reads the dataset and returns it as an output of the function of dictionary type.

print2log(data)

Prints the Reader output data as a dictionary to the console log or custom component log.

 

(info)

Note:

In the getReaderData custom function, the dataset and the sub-dataset names are the same for all datasets except for RDBMS datasets. In the case of RDBMS datasets, the sub-dataset name is the name of the table added in the RDBMS dataset.

Writing Data to File

A sample Python code to write data to file is shown in the image below.

In the above code, the writeDataToFile custom function is used to append a row to the selected Carbo Fitness file (dataset).

  • dataToWrite stores the output data of the type dictionary to write to a file.
  • writeDataToFile(dataToWrite,“action”,“delimiter”,“datasetName”)custom function appends or overwrites the output data to the selected file (Excel, CSV, or Text datasets).

The table below describes the writeDataToFile function and parameters.

Function

Parameter

Remarks

writeDataToFile(dataToWrite,“action”,

“delimiter”,“datasetName”)

dataToWrite – Output data of type “dictionary” to be written to the file.

 —

action – The action to perform on the file. You can append or overwrite more than one row in the selected dataset.

It is of type String, and values can be overwrite and append.

delimiter – The character that separates the columns in a dataset.

It is of type String, and values can be – “,” / “|” / “            ” / “ ” (comma, pipe, tab, and space).

datasetName – Name of the dataset file to which you want to write.

It is of type String.

Another example of reading and writing to a file is shown below.

Reading Data from Table

You can use getReaderData to read the data from RDBMS Table.

In the above code,

  • getReaderData(“datasetName”,“subdatasetName”) custom function checks the type of the dataset and accordingly reads the dataset from the Reader and returns it as an output of the function of type dictionary.
  • print2log(dataToWrite) prints the output data as dictionary to the console log or custom component log.

    (info)

    Note:

    In the getReaderData custom function, the dataset and the sub-dataset names are the same for all datasets except for RDBMS datasets. In case of RDBMS datasets, sub-dataset name is the name of the table added in the RDBMS dataset.

Writing Data to Table

Similarly, you can use writeDataToTable to write the data into RDBMS Table.

The table below describes the writeDataToTable function and parameters.

Function

Parameters

Remarks

writeDataToTable(dataToWrite, “dropAndCreate”, “datasetName”, “tableName”, “strategy”, keyColumns)

dataToWrite – Output data of type “dictionary” to be written to RDBMS table.

 —

dropAndCreate – Flag that decides whether to delete an existing table and create a new one or overwrite the existing table.

  • It is of type Boolean.
  • The value can be True or False.
  • The default value is True that deletes the existing rows in the table.

datasetName – Name of the RDBMS dataset that contains the table you want to edit.

It is of type String.

tableName – Name of existing/new table in the RDBMS dataset that you want to edit.

It is of type String.

strategy – The database operation you want to perform on the specified table.

 

  • It is of type String, and possible values are insert, update, and delete.
  • You can insert, update, or delete more than one row or column in the selected table in the RDBMS dataset.

keyColumns – A list of column names that need to be defined as a primary key or are already defined as a primary key.

  • In the case of insert, the keyColumns parameter should be kept empty.
  • You cannot perform update action on columns that are defined as a primary key.

 

(info)

Notes:

  • The dataset to which you are writing should be already existing.
  • You can add an existing or a new table in the dataset while creating an RDBMS dataset. You can add more than one table.
  • You can apply the above custom functions on all types of datasets present in Rubiscape.
  • Ensure the variable types, data types, the number of columns, and the column names of your output and its corresponding input dataset are the same. If there is a mismatch, RubiPython gives an error.

Uploading File to Cloud Storage

RubiPython provides a custom function to upload files to S3 server. The code syntax to upload a file to cloud storage is shown below.

filetoUpload = <Path of the local file to be uploaded to S3 server>

s3FileName = <Path on s3 server to which file is to be uploaded>

putObject = CloudFactory().uploadFile(filetoUpload, s3FileName)

Downloading File from Cloud Storage

RubiPython provides a custom function to download files stored on S3 server. The code syntax to download a file from cloud storage is shown below.

import pandas

localFileName = <Local file path>

s3FileName = <S3 server path of the file to be downloaded>

downloadObject = CloudFactory().downloadFile(s3FileName, localFileName)

If the downloaded file is a CSV, you can use the below syntax to read the contents of the downloaded file.

data = pd.read_csv(<Local file path>)

print2log(data)

Creating Connection with  Database

The "getConnectionObject" is a custom function in RubiPython. It allows access to the database without attaching the reader node to the RubiPython. It provides the single encrypted connection object in all python scripts.
To get the connection with the database, follow the steps given below.

Notes

  • The dataset with you want to get connection should already present.
  • You can apply the above custom function on all types of RDBMS datasets present in Rubiscape.
  • For database, use double quote ('' '') to maintain case sensitivity.

    Example-
    query = ''' DELETE FROM public."SaleData" WHERE "Customer Name"= 'Ken Black' '''
  • To add multiple line queries, use triple quotes (''' ''') before and after the query.

    Example-
    query = ''' UPDATE public."SaleData" SET "Customer Name"= 'Ken Black' WHERE "Row ID" = '130' '''

A sample Python code to make the connection to the database is shown below.

The table below explains the above code snippet.

Line of Code

Result

import pandas as pd

Imports the Pandas library into the current Python script or and assigns it the alias "pd"

conn = getConnectionObject("SalePy")

It creates a connection to a "SalePy" dataset.

query = 'SELECT * FROM public."SaleData" '

It holds a SQL query, which selects all columns ( * ) from a table named "SaleData" in the "public" schema.

df = pd.read_sql(query, con=conn)

Uses the pd.read_sql() function from the pandas library to execute the SQL query and retrieve data into a DataFrame.

print2log(df.head(10))

Prints the first 10 rows of the DataFrame 'df' to the log.

In addition to the functionality, you have the capability to Insert, Update, or Delete the row within an existing dataset. For more info, refer the sample code snippets.

Adding a row



Deleting a row.



Updating a Row


Using Variables in RubiPython

The variables defined at the workbook/workflow level can be used in the RubiPython custom component.

To use a user-defined variable in RubiPython, follow the steps given below.

  1. Create your algorithm flow. Refer to Building Algorithm Flow in a Workbook Canvas.
  2. Drag and drop RubiPython on your workbook canvas.
  3. If required, connect other nodes to the RubiPython node in your algorithm flow.
  4. Select RubiPython and in the Properties pane, click Configure.
    The configuration page is displayed.
  5. Write the code in the RubiPython Code Editor. Refer to Writing Custom Code using RubiPython.
  6. To use a workbook/workflow level variable in the code statements, use @@ symbols before and after the variable name.
    For example, if the variable name is var1, you can use it in RubiPython as @@var1@@.
    A sample RubiPython code containing a user-defined variable is shown in the figure below.

Publishing RubiPython Code

You can publish the RubiPython code from a workbook and reuse it in another workbook or workflow. This feature is similar to publishing models in RubiStudio.

To publish RubiPython code, follow the steps given below.

  1. Write the RubiPython code as required. Refer to Writing Custom Code using RubiPython.
  2. Run the RubiPython node.
  3. After the node is successfully executed, select the node, click the vertical ellipsis ( ), and click Publish code.

After the code is successfully published, a confirmation message is displayed. This code is listed under Reusable Codes on the Rubiscape Home page.

(info)

Notes:

  • The published code is also available under Reusable Codes in Code Fusion under rubistudio in the Task Pane of a workbook.
  • The name of the published RubiPython node should be unique. If a RubiPython code with the same name already exists, the platform gives an error.

Reusing RubiPython Code

The published RubiPython code is available for reuse in workbooks and workflows in the same workspace.

To reuse a published code, follow the steps given below.

  1. Open the workbook or create a workbook. Refer to Opening a Workbook and Creating a Workbook.
  2. Click Reusable Codes under Code Fusion in rubistudio in the Task Pane.

    The available reusable codes are displayed as shown in the figure below.

  3. Double-click or drag-and-drop the node on the workbook canvas.
  4. To run the code, select the node, click the vertical ellipsis (), and click Run.

After the node executes successfully, a confirmation message is displayed.

Custom Function to Access File from Rubiscape File Server

Using the 'getFileFromServer' custom function, you can now read data from a file on the Rubiscape File Server and access it directly in the RubiPython and RubiNotebook node without creating a dataset.
A sample code snippet to get the file from server in RubiPython is provided below:


Line of Code

Result

a = getFileFromServer("/Bank.xlsx")

'getFileFromServer' is a function, which retrives the file 'Bank.xlsx' from the rubiscape file server and assigns it o variable 'a'.

import pandas as pd 
from io import BytesIO

Imports 'pandas' libraries in python for data manipulation and ;BytesIO' from 'IO' to handle binary data.

data = BytesIO(a.getvalue())

In this line creates an in-memory binary stream using BytesIO from the binary content of a. The getvalue() method is called on a to retrieve the binary content.

df = pd.read_excel(data)

Reads the excel file from the 'data' and streams into pandas dataframe.

print2log(df.head(5))

Prints the first five rows of the DataFrame df to the log using a custom function print2log().

return df

Returns the DataFrame df, which contains the data from the Excel file.

After successfully running the RubiPython node, the output variables from the data will be available in the Custom Output Variables.

Similarly, a sample code snippet to get the file from server in RubiNotebook is provided below:



    • Related Articles

    • Custom Functions in RubiPython

      Reading Data from File A sample Python code to read data from file is shown in the image below. The table below explains the above code snippet. Line of Code Result getReaderData(“datasetName”,“subdatasetName”) This custom function checks the type of ...
    • RubiPython

      Writing Custom Code using RubiPython To write your custom code using RubiPython, follow the steps given below. Create your algorithm flow. Refer to Building Algorithm Flow in a Workbook Canvas. Drag and drop RubiPython on your workbook canvas. Notes: ...
    • RubiPython

      Writing Custom Code using RubiPython To write your custom code using RubiPython, follow the steps given below. Create your algorithm flow. Refer to Building Algorithm Flow in a Workbook Canvas. Drag and drop RubiPython on your workbook canvas. Notes: ...
    • Adding A Custom Query

      To add a Custom Query while creating a dataset, follow the steps given below. Click Custom Query on any dataset creation page. The Add Custom Query page is displayed. Enter a suitable Name for your custom query. Enter a suitable Description for your ...
    • Custom Word Remover

      Custom Words Remover is located under Textual Analysis ( ) in Pre Processing, in the task pane on the left. Use drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for analysis. One of the ...