LLM Labs (beta)

Enable your integration with models from Datasaur LLM Lab

Supported Labeling Types: Span Labeling, Row Labeling, Document Labeling

Easily integrate with models from Datasaur's LLM Labs. If you've already tested and deployed your experiment in the Datasaur LLM Labs Sandbox, we're here to help! Our integration allows you to use your deployed LLM Sandbox from Datasaur LLM Labs to enhance your labeling process.

Creating an LLM Labs Sandbox

To begin using ML-Assisted Labeling with LLM Labs, you first need to create and deploy a Sandbox. You can see this page to learn more details about how to deploy an LLM Labs Sandbox.

The output of the LLM Labs Sandbox must be in JSON object format, aligned with the label set defined in your NLP project. This ensures compatibility with regex-based string matching for labeling in your NLP platform.

We have label/question set of Category and Suggestion.
Therefore, the expected output or result from the LLM Labs Sandbox should be something like this

{
  "Category": ["Minute of Meeting"],
  "Suggestion": ["Move the 'Issues' section to the top of the notes instead of keeping it at the bottom."]
}

Prompt example to generate the correct JSON Object format

To generate a correct response in JSON object format from the LLM, you need to adjust the user instruction prompt so that the response returns in the expected format. Here are some example prompts you can try:

Given the document text, please extract the following information and present it in JSON format as shown below:
- *Category*: The type of text or notes provided. Please choose one from "Minutes of Meeting," "Draft," or "Budget Plan."
- *Suggestion*: A recommendation based on best practices for creating better notes for "Minutes of Meeting," "Draft," or "Budget Plan."

Instructions Summary:
1. Extract and present the information in the specified JSON format.
2. Ensure that all extracted data is accurate and corresponds directly to the content of each document.

Return the value of extracted fields in JSON structure in plain text, following this JSON FORMAT
{
"Category": [list of Category answer],
"Suggestion": [list of Suggestion answer]
}

VERY IMPORTANT
RETURN THE ANSWER WITHOUT ```json
EXTRACT ANSWER THAT PRECISELY WRITTEN IN DOCUMENT CONTEXT

Advance hyperparameters in the LLM Labs Sandbox

Besides directly adjusting the user instruction prompt in LLM Labs, if you are using the OpenAI model, you can also set advanced hyperparameters in the Hyperparameter configurations.

In the LLM Labs Sandbox, where you configure your model application, you can click the gear icon in the application to open the Hyperparameter configurations. From there, you can add advanced hyperparameters following OpenAI supported schema for Structured Outputs.

Image of Hyperparameter configurations modal

Here's the example of the Advance hyperparameters:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "question_answers",
      "schema": {
        "type": "object",
        "properties": {
          "Category": {
            "type": "string",
            "enum": [
              "Minute of Meeting",
              "Draft",
              "Budget Plan"
            ],
            "description": "The answer for question 1, which must be one of the predefined answers or null."
          },
          "Suggestion": {
            "type": "string",
            "description": "The answer for question 2, which can be any string or null."
          }
        },
        "required": [
          "Category",
          "Suggestion"
        ],
      }
    }
  }
}

You can focus adjusting the property and required object to match with your label or question set.

Accessing Your Deployed LLM Labs Sandbox in ML Assisted Labelling

Follow these steps to access your deployed LLM Labs Sandbox (from Datasaur LLM Labs) on your ML Assisted Labelling:

Create a custom project for Row Labeling, Span Labeling or Document Labeling.
Click "Manage Extension" on your right bar.
Pop Up Manage Extension will appear and you can enable the Datasaur ML Assisted.
Once enable it, select "LLM Labs" as provider and you will see the following menu:
ML Assisted Labeling with LLM Labs for Span Based
ML Assisted Labeling with LLM Labs for Row Based
ML Assisted Labeling with LLM Labs for Document Based
Target text: the column(s) of your targeted data for this ML assistance are based on.
Target question: the column(s) you wish to answer.
LLM application: your deployed LLM Sandbox name from LLM Labs.
API token: your API keys to access the deployed LLM Application. You can visit LLM Labs and go to `Settings` on the left sidebar, then select the `API Keys` menu.
Target pages: define specific page(s) you want to extract from a document.

Prediction Process

After setting up the above options, simply click “Predict Labels” to start predicting and obtaining labels from your deployed LLM Application from Datasaur LLM Labs.

Image of ML Assisted with LLM Labs Provider Result for Span Based

Image of ML Assisted with LLM Labs Provider Result for Row Based

Image of ML Assisted with LLM Labs Provider Result for Document Labeling

Last updated 2 months ago