Custom model

Overview

LLM Labs Custom Models allows you to integrate your own large language models (LLMs) from various providers into Datasaur's LLM Labs. This feature empowers you to leverage your specialized models alongside Datasaur's built-in capabilities.

Integration

While Datasaur offers direct integrations with some providers, you can also connect to models hosted on various third-party platforms or your own infrastructure using the "Custom model" feature.

To simplify integration, Datasaur's Custom Model connection adheres to the widely adopted API structure defined by OpenAI for its completions or chat completions endpoints. This means that if your self-hosted model or third-party serving framework exposes an endpoint that mimics the OpenAI API format, connecting it to Datasaur is straightforward.

API specification

To make things easy, Datasaur connects to these custom models using a common method – an API structure that looks just like the one used by OpenAI (specifically, the "Chat Completions" API).

Think of it like using a standard plug: if your model hosting tool (like TGI) provides this standard API interface (an OpenAI-compatible API), Datasaur can plug right in.

How the API connection works

Please note that the streaming option is currently disabled for custom models.

When Datasaur uses your custom model, here’s basically what happens:

Datasaur Sends a Request: Datasaur sends information to your model's address using a standard web method (POST). This request goes to a specific path, usually /v1/chat/completions, added to the main address you provide. The request includes:
- Extra Info (Headers): Tells the server the data is in JSON format. If you added an API Key in Datasaur, it sends that key for security (Authorization: Bearer YOUR_API_KEY).
- The Actual Data (JSON Body): This contains the important bits:
  - model: The name of the specific model you want to use.
  - messages: The conversation history, including instructions ("system" message) and the user's input ("user" message).
  - Optional settings like temperature (for creativity) or max_tokens (to limit response length).
Example Request Data:
```
{
  "model": "tgi",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Explain what an LLM is in simple terms."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150
}
```

Your Model Sends a Response: If everything works, your model hosting tool sends back a success message (200 OK) with its own JSON data, including:

id: A unique ID for this conversation turn.
model: Which model actually answered.
choices: An array (usually just one item) containing the model's reply:
- message: The actual text generated by the model (content) and its role (assistant).
usage (Optional): Information about how many "pieces" of text (tokens) were used for the prompt and the answer.

Example Response Data:

{
  "id": "chatcmpl-randomid12345",
  "object": "chat.completion",
  "created": 1713867000, // Timestamp
  "model": "your-model-name",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "An LLM, or Large Language Model, is like a super-smart computer program that's read tons of text, so it can understand and write text almost like a human!"
      },
      "finish_reason": "stop" // Why the model stopped writing
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 55,
    "total_tokens": 85
  }
}

Text Generation Inference (TGI) & Huggingface

Main Address (Base URL): Put the address where your TGI server is running. This might look like http://your-tgi-server-ip:8080/v1 or maybe use port 80 if using certain hosting like Hugging Face Inference Endpoints. Add /v1 at the end.
Model Name (in Request Data): TGI usually runs one main model at a time. You might just need to put "tgi" as the model name, or use the specific Hugging Face name the model was started with (like "NousResearch/Nous-Hermes-2"). Check your TGI setup. Learn more about TGI.
API Key: If you're using TGI, especially through services like Hugging Face, you'll likely need an API Key or Token. Get this key from your TGI provider (like your Hugging Face Access Token) and put it in the API Key field in Datasaur's Custom Model settings.

Connect your custom model

To connect to a custom model:

Navigate to the Models page.
Click the Manage providers button.
Manage providers
Click the Custom model button.
Custom models option
Input your model credentials. The required credentials are:
1. Endpoint URL: The endpoint URL of your model.
2. API key: The API key of your model.
3. Model name: Your desired model name to be used in LLM Labs.
Add custom model dialog

If you are adding a custom model from an LLM provider like Hugging Face, you only need to input the endpoint URL without the /v1/chat/completions suffix.

Once you’ve added your credentials, click the Add custom model button, and your custom model will be available in LLM Labs.
Models available

Managing custom model

To manage your custom model, click the three-dots icon on the model card.

From there, you can:

View Details
Try in Sandbox
Edit
Delete

View details

Once you click the View details menu, you will see the endpoint name of the model displayed in the details dialog.

Try in Sandbox

Once you click the Try in Sandbox menu, you'll be automatically taken to a Sandbox project. This allows you to integrate and test the model within your specific application. Learn more about Sandbox.

Edit custom model

Once you click on the Edit menu, you will be able to modify the endpoint name, API key, and model name. Once you've updated the model credentials, click the Edit custom model button to save your changes.

Delete custom model

Once you click on the Delete menu, you will be able to delete your model from LLM Labs. Confirm the deletion by clicking the Delete custom model button.

Last updated 2 months ago