# Custom Provider

**LLM Assisted Labeling** with custom provider allows you to define custom request and response mappings. These mappings are used to construct HTTP Request Payloads (body and headers) based on a specified JSON structure, enabling integration with any LLM provider.

The mapping system is based on the [OpenAPI specification](https://swagger.io/specification/), with additional rules customized by Datasaur for processing.

<figure><img src="/files/JfCmW5nFFS1wTdq9dvuY" alt=""><figcaption></figcaption></figure>

## Configuration

After enabling ML-assisted labeling and selecting **LLM Assisted Labeling**, choose **Custom** as the provider. The following fields will be available:

* **Target text**: Select input columns to be used as context.
* **Target question**: Select the question to be predicted.
* **System prompt** (optional): Define the behavior and context for the language model.
* **User prompt**: Define the labeling task for the model.
* **API version** (optional): The API Version from your Azure OpenAI.
* **API URL**: The URL for your LLM provider API.
* **Model ID** (optional): The name or ID of the model.
* API configuration
  * **Additional Input** (optional) – JSON format
    1. The attribute of the JSON will be used in the process of transforming HTTP Request Body.
  * **Request Headers** (secret textarea, optional) – JSON format
    1. The attributes of the JSON will be used as HTTP Request Headers when Datasaur calls the LLM Provider API.
    2. For example, Datasaur will create an HTTP Request with header “Authorization” based on the value below.\
       `{ "Authorization": "Bearer <access token>" }`
  * [Request Format Mapping Schema](#request-format-mapping-schema) – JSON Format.
  * [Response Format Mapping Schema](#response-format-mapping-schema) – JSON Format.
* Advanced settings
  * **Top P** (optional): Limits predictions to the smallest set with a cumulative probability of P.
  * **Temperature** (optional): Controls randomness; lower values make responses more predictable.

## Request format mapping schema

This section defines the JSON mapping for constructing the HTTP request payload before sending it to the LLM Provider. The HTTP request payload is generated by following the schema and interpolating variables from the Form Input Fields based on the mapping.

The schema follows the [OpenAPI specification](https://swagger.io/specification/), with some adjustments. It supports string, integer, number, object, and array.

### Schema structure

#### String, integer, and number

The value must contain a variable that allows Datasaur to retrieve the actual value when constructing the payload, for example: `input.row`, `input.model_id`, etc. See the illustration:

**Sample Payload**

```json
{ "prompt": "<actual value that interpolated from input.row.user_prompt>" }
```

**Sample Mapping Schema**

```json
{
  "type": "object",
  "required": true,
  "properties": {
    "prompt": {
      "type": "string",
      "value": "input.row.user_prompt"
    }
  }
}
```

#### Array

Populate the `items` field with the variable containing the information to be represented as an array. Typically, `input.row` is used for this purpose. The field can accept either an **actual array or a single object**. If the data source is a single object, the resulting payload will be an array containing that single object: `[ { … } ]`. If the data source is an array, the payload will be an array of mapped items.

* `items_mapping`: Must be filled with instructions on how to map the data. Within `items_mapping`, use the `item.` variable to access individual items.

**Sample payload**

```json
[{ "prompt": "<actual value that interpolated from input.row.user_prompt>" }]
```

**Sample mapping schema**

```json
{
  "type": "array",
  "items": "input.row",
  "items_mapping": {
    "type": "object",
    "properties": {
      "prompt": {
        "type": "string",
        "value": "item.user_prompt"
      }
    }
  }
}
```

#### Available variables

1. `input`: Represents the Form Input Fields above. The mapping process interpolates the value of this variable.
   1. `input.row`: Represent the row from the document with the following properties.
      * `row_id`: The row number.
      * `user_prompt`: The combined user prompt and content from target text and target question.
2. `additional_input`: Used for any additional values that need to be included. Use dot notation to access the custom attributes.
3. `item`: Arrays can be mapped by implementing mapping for each item within the array. This attribute can map an array or a single object. Datasaur automatically infers the actual value from the variable provided in the `items` attribute above. Built-in attributes are available to access the data.
   1. `item.user_prompt`: Refers to the `user_prompt` for each object stored in the `input.row` variable.

Here is the [OpenAPI Specification](https://swagger.io/specification/) for `input` and `additional_input` variables:

```json
{
  "input": {
    "type": "object",
    "properties": {
      "row": {
        "type": "object",
        "required": true,
        "properties": {
          "row_id": {
            "type": "integer",
            "required": true
          },
          "user_prompt": {
            "type": "string",
            "required": true
           }
        }
      },
      "system_prompt": {
        "type": "string",
        "required": false
      },
      "model_id": {
        "type": "string",
        "required": false
      },
      "top_p": {
        "type": "number",
        "required": false
      },
      "temperature": {
        "type": "number",
        "required": false
      },
      "api_key": {
        "type": "string",
        "required": false
      },
      "api_version": {
        "type": "string",
        "required": false
      }
    }
  },
  "additional_input": {
    "type": "object",
    "required": false
  }
}
```

### Examples

This document includes three examples: Datasaur Custom API, OpenAI, and Gemini. The following data is used across all examples for consistency:

* **User prompt**: `"Text: {targetText}\n What is the sentiment for the text above? Choose one from the options below\n {targetOptions}\n Answer:"`
* **targetText** (part of the user prompt): `"I feel good".`
* **targetOptions** (part of the user prompt, inferred from options in selected target question, separated by new line for each option): `"positive\n negative\n".`

Each example includes:

* The **expected HTTP request body** for a specific LLM provider.
* The **request format mapping** example that will process the data and transform to the expected HTTP request body format above.
* Any **additional input** that may be needed for a specific LLM provider to transform the data as expected.

#### Datasaur custom API request format

**Expected HTTP request body**

```json
[
  {
    "id": 0,
    "text": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:"
  }
]
```

**Request format mapping schema**

```json
{
  "type": "array",
  "items": "input.row",
  "items_mapping": {
    "type": "object",
    "properties": {
      "id": {
        "type": "integer",
        "value": "item.row_id"
      },
      "text": {
        "type": "string",
        "value": "item.user_prompt"
      }
    }
  }
}
```

#### OpenAI request format

**Expected HTTP request body**

```json
{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:"
    }
  ],
  "temperature": 0.7
}
```

**Additional input**

Use the `additional_input.role` to provide the value of the required attribute of `user`.

```json
{ "role": "user" }
```

**Request format mapping schema**

```json
{
  "type": "object",
  "properties": {
    "model": {
      "type": "string",
      "value": "input.model_id"
    },
    "messages": {
      "type": "array",
      "items": "input.row",
      "items_mapping": {
        "type": "object",
        "properties": {
          "role": {
            "type": "string",
            "value": "additional_input.role"
          },
          "content": {
            "type": "string",
            "value": "item.user_prompt"
          }
        }
      }
    },
    "temperature": {
      "type": "number",
      "value": "input.temperature"
    }
  }
}
```

#### Gemini request format

**Expected HTTP request body**

```json
{
  "contents": [
    {
      "parts": [
        { "text": "Text: I feel good.\n What is the sentiment for the text above ? Choose one from the options below\n positive\n negative\n Answer:" }
      ]
    }
  ]
}
```

**Request format mapping schema**

```json
{
  "type": "object",
  "properties": {
    "contents": {
      "type": "array",
      "items": "input.row",
      "items_mapping": {
        "type": "object",
        "properties": {
          "parts": {
            "type": "array",
            "items": [
              {
                "type": "object",
                "properties": {
                  "text": {
                    "type": "string",
                    "value": "item.user_prompt"
                  }
                }
              }
            ]
          }
        }
      }
    }
  }
}
```

## Response format mapping schema

This section defines the JSON mapping used to transform the LLM provider's response to match Datasaur's expected response format so labels can be applied correctly.

The mapping uses the `response` keyword with dot notation to access nested attributes.

### Datasaur expected response format

The whole HTTP response body will be referenced as the `response` variable.

```json
{ "label": "<put the label inferenced from the LLM here>" }
```

Here is the [OpenAPI specification](https://swagger.io/specification/):

```
{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "required": true,
      "description": "The predicted label for the row"
    } 
  }
}
```

Use the placeholder below to define the mapping by filling in the variables:

```json
{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response.<the label>"
    }
}
```

### Examples

The examples below show how to define a response mapping for a specific API response so it matches Datasaur’s expected format.

#### Datasaur Custom API response format

**Original HTTP response body format**

```json
[
  {
    "id": 0,
    "label": "POSITIVE"
  }
]
```

**Response format mapping schema**

Need to map the `id` from the above response as `row_id`.

```json
{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response[0].label"
    }
  }
}
```

#### OpenAI response format

**Original HTTP response body format**

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Test"
      },
      "logprobs": null,
      "finish_reason": "stop",
      "index": 0
    }
  ]
}
```

**Response format mapping schema**

The mapping below will use the `response.choices` as the array of LLM response. The `item_mapping` will also be used to get the message and the index.

```json
{
  "type": "object",
  "properties" {
    "label": {
      "type": "string",
      "value": "response.choices[0].message.content"
    }
  }
}
```

#### Gemini response format

**Original HTTP response body format**

```json
{
  "candidates": [
    {
      "content": {
      "parts": [
        {
          "text": "Test"
        },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w0 image.jpg)'"
          }
        }
      ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0
    }
  ]
}
```

**Response format mapping schema**

Ignore the inline image and use the response to select the label like the following:

```json
{
  "type": "object",
  "properties": {
    "label": {
      "type": "string",
      "value": "response.candidates[0].content.parts[0].text"
    }
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/assisted-labeling/ml-assisted-labeling/llm-assisted-labeling/custom-provider.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
