Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • Request Format Mapping Schema
  • Schema Structure
  • Examples
  • Response Format Mapping Schema
  • Datasaur Expected Response Format
  • Examples
  1. Assisted Labeling
  2. ML Assisted Labeling
  3. LLM Assisted Labeling

Custom Provider

Allowing easy integration with your existing LLM Provider with custom request and response format mappings.

Last updated 3 months ago

With the Custom LLM Assisted Labeling Provider, users can provide custom request and response format mappings. These mappings will be used to construct HTTP Request Payloads (Body and Header) following the specified JSON structure. This feature will allow for easy integration with any existing LLM Provider.

The mapping system is based on the , with additional rules customized by Datasaur for processing.

Once you enable the ML Assisted Labeling in your project and choose LLM Assisted Labeling, you can access several fields under the extension. These fields include:

  1. LLM Provider (select, required): “Custom LLM Provider” must be selected.

  2. Target Text (multiple select, required): Define your text column(s) that is going to be treated as input and prompt context.

  3. Target Question (select, required): Select your question to be answered.

  4. System Prompt (text, optional): Sets the behavior and context for the language model.

  5. User Prompt (text area, required): User definition of a task to be completed in a specific labeling workflow.

  6. API Version (text, optional): The API Version from your Azure OpenAI.

  7. API URL (text, required): The URL for your LLM provider API.

  8. Model ID (text, optional): The name or ID of the model.

  9. Top P (number, optional): Limits predictions to the smallest set with a cumulative probability of P.

  10. Temperature (number, optional): Controls randomness; lower values make responses more predictable.

  11. Additional Input (textarea, optional) – JSON format

    1. The attribute of the JSON will be used in the process of transforming HTTP Request Body.

  12. Request Headers (secret textarea, optional) – JSON format

    1. The attributes of the JSON will be used as HTTP Request Headers when Datasaur calls the LLM Provider API.

    2. For example, Datasaur will create an HTTP Request with header “Authorization” based on the value below. { "Authorization": "Bearer <access token>" }


Request Format Mapping Schema

This section defines the JSON mapping for constructing the HTTP request payload before sending it to the LLM Provider. The HTTP request payload is generated by following the schema and interpolating variables from the Form Input Fields based on the mapping.

Schema Structure

String, Integer, and Number

The value must contain a variable that allows Datasaur to retrieve the actual value when constructing the payload, e.g., "input.row", "input.model_id", etc. See the illustration:

Sample Payload

{ "prompt": "<actual value that interpolated from input.row.user_prompt>" }

Sample Mapping Schema

{
  "type": "object",
  "required": true,
  "properties": {
    "prompt": {
      "type": "string",
      "value": "input.row.user_prompt"
    }
  }
}

Array

Populate the "items" field with the variable containing the information to be represented as an array. Typically, "input.row" is used for this purpose. The field can accept either an actual array or a single object. If the data source is a single object, the resulting payload will be an array containing that single object, i.e. [ { … } ]. If the data source is an array, the payload will be an array of mapped items.

  • "items_mapping": Must be filled with instructions on how to map the data. Within "items_mapping", use the "item." variable to access individual items.

Sample Payload

[{ "prompt": "<actual value that interpolated from input.row.user_prompt>" }]

Sample Mapping Schema

{
  "type": "array",
  "items": "input.row",
  "items_mapping": {
    "type": "object",
    "properties": {
      "prompt": {
        "type": "string",
        "value": "item.user_prompt"
      }
    }
  }
}

Available Variable

  1. "input": Represents the Form Input Fields above. The mapping process interpolates the value of this variable.

    1. “input.row”: Represent the row from the document with the following properties.

      • "row_id": The row number.

      • “user_prompt”: The combined User Prompt and Content from Target Text and Target Question.

  2. "additional_input": Used for any additional values that need to be included. Use dot notation to access the custom attributes.

  3. "item”: Arrays can be mapped by implementing mapping for each item within the array. This attribute can map an array or a single object. Datasaur automatically infers the actual value from the variable provided in the "items" attribute above. Built-in attributes are available to access the data.

    1. “item.user_prompt”: Refers to the user_prompt for each object stored in the input.row variable.

{
  "input": {
    "type": "object",
    "properties": {
      "row": {
        "type": "object",
        "required": true,
        "properties": {
          "row_id": {
            "type": "integer",
            "required": true
          },
          "user_prompt": {
            "type": "string",
            "required": true
           }
        }
      },
      "system_prompt": {
        "type": "string",
        "required": false
      },
      "model_id": {
        "type": "string",
        "required": false
      },
      "top_p": {
        "type": "number",
        "required": false
      },
      "temperature": {
        "type": "number",
        "required": false
      },
      "api_key": {
        "type": "string",
        "required": false
      },
      "api_version": {
        "type": "string",
        "required": false
      }
    }
  },
  "additional_input": {
    "type": "object",
    "required": false
  }
}

Examples

This document contains three examples: Datasaur Custom API, OpenAI, and Gemini. For easier reference and understanding, below data will be used across all the examples.

  • User Prompt: "Text: {targetText}\n What is the sentiment for the text above? Choose one from the options below\n {targetOptions}\n Answer:"

  • targetText (part of the user prompt): "I feel good".

  • targetOptions (part of the user prompt, inferred from options in selected Target Question, separated by new line for each option): "positive\n negative\n".

Each example consists of:

  1. The expected HTTP request body of a specific LLM provider.

  2. The request format mapping example that will process the data and transform to the expected HTTP request body format above.

  3. Any additional input that may be needed for a specific LLM provider to transform the data as expected.

Datasaur Custom API Request Format

Expected HTTP Request Body

[
  {
    "id": 0,
    "text": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:"
  }
]

Request Format Mapping Schema

{
  "type": "array",
  "items": "input.row",
  "items_mapping": {
    "type": "object",
    "properties": {
      "id": {
        "type": "integer",
        "value": "item.row_id"
      },
      "text": {
        "type": "string",
        "value": "item.user_prompt"
      }
    }
  }
}

OpenAI Request Format

Expected HTTP Request Body

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:"
    }
  ],
  "temperature": 0.7
}

Additional Input

Use the "additional_input.role" to provide the value of the required attribute of "user".

{ "role": "user" }

Request Format Mapping Schema

{
  "type": "object",
  "properties": {
    "model": {
      "type": "string",
      "value": "input.model_id"
    },
    "messages": {
      "type": "array",
      "items": "input.row",
      "items_mapping": {
        "type": "object",
        "properties": {
          "role": {
            "type": "string",
            "value": "additional_input.role"
          },
          "content": {
            "type": "string",
            "value": "item.user_prompt"
          }
        }
      }
    },
    "temperature": {
      "type": "number",
      "value": "input.temperature"
    }
  }
}

Gemini Request Format

Expected HTTP Request Body

{
  "contents": [
    {
      "parts": [
        { "text": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:" }
      ]
    }
  ]
}

Request Format Mapping Schema

{
  "type": "object",
  "properties": {
    "contents": {
      "type": "array",
      "items": "input.row",
      "items_mapping": {
        "type": "object",
        "properties": {
          "parts": {
            "type": "array",
            "items": [
              {
                "type": "object",
                "properties": {
                  "text": {
                    "type": "string",
                    "value": "item.user_prompt"
                  }
                }
              }
            ]
          }
        }
      }
    }
  }
}

Response Format Mapping Schema

This section is dedicated to the JSON mapping for transforming the LLM Provider's response to match Datasaur's Expected Response Format, ensuring proper label ingestion and applying the label correctly.

The mapping utilizes keyword "response" with dot notation to access attributes.

Datasaur Expected Response Format

The whole HTTP response body will be referenced as the "response" variable.

{ "label": "<put the label inferenced from the LLM here>" }
{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "required": true,
      "description": "The predicted label for the row"
    } 
  }
}

In short, create the mapping using the below placeholder by filling out the variables.

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response.<the label>"
    }
}

Examples

Essentially, the examples below will explain how to define a response format mapping for a specific API response so that it will be compatible for ingestion in Datasaur.

Datasaur Custom API Response Format

Original HTTP Response Body Format

[
  {
    "id": 0,
    "label": "POSITIVE"
  }
]

Response Format Mapping Schema

Need to map the "id" from the above response as "row_id".

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response[0].label"
    }
  }
}

OpenAI Response Format

Original HTTP Response Body Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Test"
      },
      "logprobs": null,
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Response Format Mapping Schema

The mapping below will use the “response.choices” as the array of LLM Response. The “item_mapping” will also be used to get the message and the index.

{
  "type": "object",
  "properties" {
    "label": {
      "type": "string",
      "value": "response.choices[0].message.content"
    }
  }
}

Gemini Response Format

Original HTTP Response Body Format

{
  "candidates": [
    {
      "content": {
      "parts": [
        {
          "text": "Test"
        },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w0 image.jpg)'"
          }
        }
      ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0
    }
  ]
}

Response Format Mapping Schema

Ignore the inline image and use the response to select the label like the following:

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response.candidates[0].content.parts[0].text"
    }
}

\

(textarea, required) – JSON Format.

(textarea, required) – JSON Format.

The schema follows the , with some adjustments. It supports string, integer, number, object, and array.

Here is the for "input" and "additional_input" variables:

Here is the :

OpenAPI Specification
OpenAPI Specification
OpenAPI specification
Request Format Mapping Schema
Response Format Mapping Schema
Open API Specification
ML Assisted with LLM Custom Provider