Extract PDF Node (LLM Extraction)

Extract PDF Node (LLM Extraction)

Overview

A new Extract PDF Node has been introduced under the Rubi AI category to enable structured data extraction from PDF documents using Large Language Models (LLMs). The feature supports configurable prompt-based extraction pipelines for generating structured reusable outputs from PDF files.


Step 1 – Extract PDF Node under Rubi AI

The Extract PDF Node is available under the Rubi AI category and can be added directly to the pipeline canvas.



Step 2 – Choose Business Context

Users can select the required business context for document extraction such as Finance, Procurement, HR, Operations, IT, or Custom extraction scenarios.



Step 3 – Choose Extraction Prompt

Users can select predefined extraction prompts for commonly used document extraction use cases.



Step 4 – Add Custom Prompt

Users can also create custom extraction prompts to define specific extraction requirements.



Step 5 – Configure Custom Prompt

Custom prompts support user-defined prompt names and extraction instructions for tailored document processing.



Step 6 – Model & Execution Settings

Users can configure the LLM model and execution settings before processing the extraction request.



Feature Enhancements

  • Added Extract PDF Node under the Rubi AI category.

  • Supports input from PDF Reader and Split PDF nodes.

  • Allows multiple PDF nodes to be connected as input.

  • Supports predefined and custom prompt-based extraction pipelines.

  • Structured extracted results are available in the Explore section.

  • Users can rerun extraction with modified prompts without recreating the node.

  • Generated output can be passed to downstream pipeline nodes.


Benefits

  • Automated extraction of structured data from PDFs.

  • Reduced manual document processing effort.

  • Flexible AI-powered extraction pipelines.

  • Reusable structured outputs for analytics and automation.

  • Improved document intelligence capabilities.

    • Related Articles

    • Pdf Doc

      Creating Pdf Doc Dataset Rubiscape supports pdf files (.pdf) for dataset creation. To create an pdf dataset using any of the modes, follow the basic steps given below. On the home page, click Create icon . The Product Selection page is displayed. ...
    • Node Categories

      Using Control in Workbook and Workflow Rubiscape provides a combined node hierarchy called Control in Model Studio and Data Integrator. The functionalities present herein are used to perform various tasks on Workbook and Workflow. The sub-categories ...
    • Copying Node in Same Workbook

      Rubiscape provides a facility to copy a single node, multiple nodes, or connected nodes in the same workbook using keyboard events (shortcuts). Notes: You can copy the Node (s) in the same workbook or workflow but cannot copy them in another workbook ...
    • Export to PDF/PPT using Dashboard Schedule

      In RubiSight, you can export a dashboard using the Export functionality. Refer to Exporting a Dashboard. You can not only schedule this export, but also send the dashboard pages as PDF via email, at a stipulated date and time. Notes: You can schedule ...
    • Configuring RubiAI Model in Administrator Application

      RubiAI Model Configuration allows administrators to connect Rubiscape with Large Language Model (LLM) providers such as Gemini. Once configured, RubiAI features become available across Rubisight and Rubistudio for Designer Assistant, widget insights, ...