Custom Provider

Allowing easy integration with your existing LLM Provider with custom request and response format mappings.

With the Custom LLM Assisted Labeling Provider, users can provide custom request and response format mappings. These mappings will be used to construct HTTP Request Payloads (Body and Header) following the specified JSON structure. This feature will allow for easy integration with any existing LLM Provider.

The mapping system is based on the Open API Specification, with additional rules customized by Datasaur for processing.

Once you enable the ML Assisted Labeling in your project and choose LLM Assisted Labeling, you can access several fields under the extension. These fields include:

LLM Provider (select, required): “Custom LLM Provider” must be selected.
Target Text (multiple select, required): Define your text column(s) that is going to be treated as input and prompt context.
Target Question (select, required): Select your question to be answered.
System Prompt (text, optional): Sets the behavior and context for the language model.
User Prompt (text area, required): User definition of a task to be completed in a specific labeling workflow.
API Version (text, optional): The API Version from your Azure OpenAI.
API URL (text, required): The URL for your LLM provider API.
Model ID (text, optional): The name or ID of the model.
Top P (number, optional): Limits predictions to the smallest set with a cumulative probability of P.
Temperature (number, optional): Controls randomness; lower values make responses more predictable.
Additional Input (textarea, optional) – JSON format
1. The attribute of the JSON will be used in the process of transforming HTTP Request Body.
Request Headers (secret textarea, optional) – JSON format
1. The attributes of the JSON will be used as HTTP Request Headers when Datasaur calls the LLM Provider API.
2. For example, Datasaur will create an HTTP Request with header “Authorization” based on the value below. { "Authorization": "Bearer <access token>" }
Request Format Mapping Schema (textarea, required) – JSON Format.
Response Format Mapping Schema (textarea, required) – JSON Format.

Request Format Mapping Schema

This section defines the JSON mapping for constructing the HTTP request payload before sending it to the LLM Provider. The HTTP request payload is generated by following the schema and interpolating variables from the Form Input Fields based on the mapping.

The schema follows the OpenAPI Specification, with some adjustments. It supports string, integer, number, object, and array.

Schema Structure

String, Integer, and Number

The value must contain a variable that allows Datasaur to retrieve the actual value when constructing the payload, e.g., "input.row", "input.model_id", etc. See the illustration:

Sample Payload

{ "prompt": "<actual value that interpolated from input.row.user_prompt>" }

Sample Mapping Schema

{
  "type": "object",
  "required": true,
  "properties": {
    "prompt": {
      "type": "string",
      "value": "input.row.user_prompt"
    }
  }
}

Array

Populate the "items" field with the variable containing the information to be represented as an array. Typically, "input.row" is used for this purpose. The field can accept either an actual array or a single object. If the data source is a single object, the resulting payload will be an array containing that single object, i.e. [ { … } ]. If the data source is an array, the payload will be an array of mapped items.

"items_mapping": Must be filled with instructions on how to map the data. Within "items_mapping", use the "item." variable to access individual items.

Sample Payload

[{ "prompt": "<actual value that interpolated from input.row.user_prompt>" }]

Sample Mapping Schema

{
  "type": "array",
  "items": "input.row",
  "items_mapping": {
    "type": "object",
    "properties": {
      "prompt": {
        "type": "string",
        "value": "item.user_prompt"
      }
    }
  }
}

Available Variable

"input": Represents the Form Input Fields above. The mapping process interpolates the value of this variable.
1. “input.row”: Represent the row from the document with the following properties.
  - "row_id": The row number.
  - “user_prompt”: The combined User Prompt and Content from Target Text and Target Question.
"additional_input": Used for any additional values that need to be included. Use dot notation to access the custom attributes.
"item”: Arrays can be mapped by implementing mapping for each item within the array. This attribute can map an array or a single object. Datasaur automatically infers the actual value from the variable provided in the "items" attribute above. Built-in attributes are available to access the data.
1. “item.user_prompt”: Refers to the user_prompt for each object stored in the input.row variable.

Here is the OpenAPI Specification for "input" and "additional_input" variables:

{
  "input": {
    "type": "object",
    "properties": {
      "row": {
        "type": "object",
        "required": true,
        "properties": {
          "row_id": {
            "type": "integer",
            "required": true
          },
          "user_prompt": {
            "type": "string",
            "required": true
           }
        }
      },
      "system_prompt": {
        "type": "string",
        "required": false
      },
      "model_id": {
        "type": "string",
        "required": false
      },
      "top_p": {
        "type": "number",
        "required": false
      },
      "temperature": {
        "type": "number",
        "required": false
      },
      "api_key": {
        "type": "string",
        "required": false
      },
      "api_version": {
        "type": "string",
        "required": false
      }
    }
  },
  "additional_input": {
    "type": "object",
    "required": false
  }
}

Examples

This document contains three examples: Datasaur Custom API, OpenAI, and Gemini. For easier reference and understanding, below data will be used across all the examples.

User Prompt: "Text: {targetText}\n What is the sentiment for the text above? Choose one from the options below\n {targetOptions}\n Answer:"
targetText (part of the user prompt): "I feel good".
targetOptions (part of the user prompt, inferred from options in selected Target Question, separated by new line for each option): "positive\n negative\n".

Each example consists of:

The expected HTTP request body of a specific LLM provider.
The request format mapping example that will process the data and transform to the expected HTTP request body format above.
Any additional input that may be needed for a specific LLM provider to transform the data as expected.

Datasaur Custom API Request Format

Expected HTTP Request Body

[
  {
    "id": 0,
    "text": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:"
  }
]

Request Format Mapping Schema

{
  "type": "array",
  "items": "input.row",
  "items_mapping": {
    "type": "object",
    "properties": {
      "id": {
        "type": "integer",
        "value": "item.row_id"
      },
      "text": {
        "type": "string",
        "value": "item.user_prompt"
      }
    }
  }
}

OpenAI Request Format

Expected HTTP Request Body

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:"
    }
  ],
  "temperature": 0.7
}

Additional Input

Use the "additional_input.role" to provide the value of the required attribute of "user".

{ "role": "user" }

Request Format Mapping Schema

{
  "type": "object",
  "properties": {
    "model": {
      "type": "string",
      "value": "input.model_id"
    },
    "messages": {
      "type": "array",
      "items": "input.row",
      "items_mapping": {
        "type": "object",
        "properties": {
          "role": {
            "type": "string",
            "value": "additional_input.role"
          },
          "content": {
            "type": "string",
            "value": "item.user_prompt"
          }
        }
      }
    },
    "temperature": {
      "type": "number",
      "value": "input.temperature"
    }
  }
}

Gemini Request Format

Expected HTTP Request Body

{
  "contents": [
    {
      "parts": [
        { "text": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:" }
      ]
    }
  ]
}

Request Format Mapping Schema

{
  "type": "object",
  "properties": {
    "contents": {
      "type": "array",
      "items": "input.row",
      "items_mapping": {
        "type": "object",
        "properties": {
          "parts": {
            "type": "array",
            "items": [
              {
                "type": "object",
                "properties": {
                  "text": {
                    "type": "string",
                    "value": "item.user_prompt"
                  }
                }
              }
            ]
          }
        }
      }
    }
  }
}

Response Format Mapping Schema

This section is dedicated to the JSON mapping for transforming the LLM Provider's response to match Datasaur's Expected Response Format, ensuring proper label ingestion and applying the label correctly.

The mapping utilizes keyword "response" with dot notation to access attributes.

Datasaur Expected Response Format

The whole HTTP response body will be referenced as the "response" variable.

{ "label": "<put the label inferenced from the LLM here>" }

Here is the OpenAPI specification:

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "required": true,
      "description": "The predicted label for the row"
    } 
  }
}

In short, create the mapping using the below placeholder by filling out the variables.

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response.<the label>"
    }
}

Examples

Essentially, the examples below will explain how to define a response format mapping for a specific API response so that it will be compatible for ingestion in Datasaur.

Datasaur Custom API Response Format

Original HTTP Response Body Format

[
  {
    "id": 0,
    "label": "POSITIVE"
  }
]

Response Format Mapping Schema

Need to map the "id" from the above response as "row_id".

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response[0].label"
    }
  }
}

OpenAI Response Format

Original HTTP Response Body Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Test"
      },
      "logprobs": null,
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Response Format Mapping Schema

The mapping below will use the “response.choices” as the array of LLM Response. The “item_mapping” will also be used to get the message and the index.

{
  "type": "object",
  "properties" {
    "label": {
      "type": "string",
      "value": "response.choices[0].message.content"
    }
  }
}

Gemini Response Format

Original HTTP Response Body Format

{
  "candidates": [
    {
      "content": {
      "parts": [
        {
          "text": "Test"
        },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w0 image.jpg)'"
          }
        }
      ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0
    }
  ]
}

Response Format Mapping Schema

Ignore the inline image and use the response to select the label like the following:

{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response.candidates[0].content.parts[0].text"
    }
}

Last updated 5 months ago