Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
      • Dropbox
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • Creating an LLM Labs Sandbox
  • Prompt example to generate the correct JSON Object format
  • Advance hyperparameters in the LLM Labs Sandbox
  • Accessing Your Deployed LLM Labs Sandbox in ML Assisted Labelling
  • Prediction Process
  1. Assisted Labeling
  2. ML Assisted Labeling

LLM Labs (beta)

Enable your integration with models from Datasaur LLM Lab

Last updated 1 month ago

Supported Labeling Types: Span Labeling, Row Labeling, Document Labeling

Easily integrate with models from Datasaur's LLM Labs. If you've already tested and deployed your experiment in the Datasaur LLM Labs Sandbox, we're here to help! Our integration allows you to use your deployed LLM Sandbox from Datasaur LLM Labs to enhance your labeling process.

Creating an LLM Labs Sandbox

To begin using ML-Assisted Labeling with LLM Labs, you first need to create and deploy a Sandbox. You can see to learn more details about how to deploy an LLM Labs Sandbox.

The output of the LLM Labs Sandbox must be in JSON object format, aligned with the label set defined in your NLP project. This ensures compatibility with regex-based string matching for labeling in your NLP platform.

  1. We have label/question set of Category and Suggestion.

  2. Therefore, the expected output or result from the LLM Labs Sandbox should be something like this

{
  "Category": ["Minute of Meeting"],
  "Suggestion": ["Move the 'Issues' section to the top of the notes instead of keeping it at the bottom."]
}

Prompt example to generate the correct JSON Object format

To generate a correct response in JSON object format from the LLM, you need to adjust the user instruction prompt so that the response returns in the expected format. Here are some example prompts you can try:

Given the document text, please extract the following information and present it in JSON format as shown below:
- *Category*: The type of text or notes provided. Please choose one from "Minutes of Meeting," "Draft," or "Budget Plan."
- *Suggestion*: A recommendation based on best practices for creating better notes for "Minutes of Meeting," "Draft," or "Budget Plan."

Instructions Summary:
1. Extract and present the information in the specified JSON format.
2. Ensure that all extracted data is accurate and corresponds directly to the content of each document.

Return the value of extracted fields in JSON structure in plain text, following this JSON FORMAT
{
"Category": [list of Category answer],
"Suggestion": [list of Suggestion answer]
}

VERY IMPORTANT
RETURN THE ANSWER WITHOUT ```json
EXTRACT ANSWER THAT PRECISELY WRITTEN IN DOCUMENT CONTEXT

Advance hyperparameters in the LLM Labs Sandbox

Besides directly adjusting the user instruction prompt in LLM Labs, if you are using the OpenAI model, you can also set advanced hyperparameters in the Hyperparameter configurations.

Here's the example of the Advance hyperparameters:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "question_answers",
      "schema": {
        "type": "object",
        "properties": {
          "Category": {
            "type": "string",
            "enum": [
              "Minute of Meeting",
              "Draft",
              "Budget Plan"
            ],
            "description": "The answer for question 1, which must be one of the predefined answers or null."
          },
          "Suggestion": {
            "type": "string",
            "description": "The answer for question 2, which can be any string or null."
          }
        },
        "required": [
          "Category",
          "Suggestion"
        ],
      }
    }
  }
}

You can focus adjusting the property and required object to match with your label or question set.

Accessing Your Deployed LLM Labs Sandbox in ML Assisted Labelling

Follow these steps to access your deployed LLM Labs Sandbox (from Datasaur LLM Labs) on your ML Assisted Labelling:

  1. Create a custom project for Row Labeling, Span Labeling or Document Labeling.

  2. Click "Manage Extension" on your right bar.

  3. Pop Up Manage Extension will appear and you can enable the Datasaur ML Assisted.

  4. Once enable it, select "LLM Labs" as provider and you will see the following menu:

  5. Target text: the column(s) of your targeted data for this ML assistance are based on.

  6. Target question: the column(s) you wish to answer.

  7. Target pages: define specific page(s) you want to extract from a document.

Prediction Process

After setting up the above options, simply click “Predict Labels” to start predicting and obtaining labels from your deployed LLM Application from Datasaur LLM Labs.

In the LLM Labs Sandbox, where you configure your model application, you can click the gear icon in the application to open the Hyperparameter configurations. From there, you can add advanced hyperparameters following .

LLM application: your deployed LLM Sandbox name from .

API token: your API keys to access the deployed LLM Application. You can visit and go to `Settings` on the left sidebar, then select the `API Keys` menu.

this page
OpenAI supported schema for Structured Outputs
LLM Labs
LLM Labs
Hyperparameter configurations modal
ML Assisted Labeling with LLM Labs for Span Based
ML Assisted Labeling with LLM Labs for Row Based
ML Assisted Labeling with LLM Labs for Document Based
Settings Api Keys
ML Assisted with LLM Labs Provider Result for Span Based
ML Assisted with LLM Labs Provider Result for Row Based
ML Assisted with LLM Labs Provider Result for Document Labeling
Image of Hyperparameter configurations modal
Image of ML Assisted Labeling with LLM Labs for Span Based
Image of ML Assisted Labeling with LLM Labs for Row Based
Image of ML Assisted Labeling with LLM Labs for Document Based
Image of Settings Api Keys page
Image of ML Assisted with LLM Labs Provider Result for Span Based
Image of ML Assisted with LLM Labs Provider Result for Row Based
Image of ML Assisted with LLM Labs Provider Result for Document Labeling