Deployment API
Overview
This page explains how to use our Deployment API with several use cases, providing detailed instructions and examples for different scenarios.
Run Deployed LLM Application with Chat Completion
Executes a RAG-enhanced chat completion request against a deployed LLM application.
Endpoint
POST https://deployment.datasaur.ai/api/deployment/:teamId/:deploymentId/chat/completions
Path Parameters
teamId
string
Your team identifier
deploymentId
string
The ID of your LLM application deployment
Query Parameters
source
string
(Optional) Source identifier
sourceId
string
(Optional) Specific source ID
Request Body
The request body follows the OpenAI-compatible chat completion format, with additional RAG-specific enhancements.
Key Properties
messages
array
Yes
Array of message objects representing the conversation
stream
boolean
No
Enable streaming responses
include_usage
boolean
No
rewrite_query
string
No
Override the behavior of summarizing the messages as query with an arbitrary text.
filter_metadata
object | string
No
Filter data based on the File Properties attached to each file.
Message Types
Messages can be of the following roles:
user
: User inputsassistant
: Assistant responsestool
: Tool/function responses
Advanced Features
Multimedia Content Types
The content
field in messages supports various types:
type ContentPart =
| TextContent
| ImageContent
| URLContent;
Text Content
{
"type": "text",
"text": "Your text here"
}
Image Content
{
"type": "image",
"image_url": {
"url": "https://example.com/image.jpg",
"detail": "high"
}
}
URL Content
{
"type": "url",
"url": "https://example.com",
"name": "Optional name",
"options": {
"select_pages": "1-3",
"include_page_screenshot_as_image": false
}
}
The URL content type supports both standard web URLs and base64-encoded data URLs, allowing you to:
Reference external web content: Use standard URLs (https://example.com)
Embed file content directly: Use base64 data URLs for PDF, HTML, or other content types
"url": "data:application/pdf;base64,JVBERi0xLjMKJcTl8uXrp..."
Options:
select_pages: Specifies which pages to process from multi-page documents (PDFs)
Format: "1-5" (range), "1,3,5" (specific pages), or "1-3,7,9-11" (combination)
Example: "select_pages": "1-3,5,8-10"
Default: All pages if not specified
include_page_screenshot_as_image: When set to true, includes visual representations of pages. The visual representations are also sent to the model when the model selected in the sandbox application supports visual capability.
For PDFs: Renders page as image for visual analysis
For websites: Captures screenshot of the rendered page
Enables the model to analyze visual layouts, charts, and non-text elements
Default: false
Base64 encoding is particularly useful for:
Embedding content directly without requiring separate file uploads
Processing temporary or dynamically generated content
Working with content that doesn't have a public URL
Advanced Retrieval: Query Rewriting
Query rewriting is a technique used in information retrieval and search systems to modify or enhance the original search query to improve search results. There might be cases where you prefer a customized way to summarize the message. You can rewrite the query with your version by specifying the rewrite_query
. For example:
{
"messages": [
{ "role": "user", "content": "What's the weather in Bali?" },
{ "role": "assistant", "content": "It's hot and humid." },
{ "role": "user", "content": "How about in Jakarta?" }
],
"rewrite_query": "What's the weather in Jakarta?"
}
Advanced Retrieval: Metadata Filtering
Another way to improve the accuracy of the retrieval is through additional filtering, to ensure only relevant information is retrieved. Below is an example of filtering search result based on the jurisdiction
and the date
.
{
"filter_metadata": {
"bool" : {
"must" : [
{
"bool" : {
"should" : [
{ "term" : { "jurisdiction" : "alabama" } },
{ "term" : { "jurisdiction" : "florida" } },
{ "term" : { "jurisdiction" : "nevada" } }
]
}
},
{
"range" : { "date" : {"gt": "2024-10-29" } }
}
]
}
}
}
Example Usage
Simple Text Query
{
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}
Multi-turn Conversation
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant who specializes in geography."
},
{
"role": "user",
"content": "What is the capital of France?"
},
{
"role": "assistant",
"content": "The capital of France is Paris. It's often called the 'City of Light' (Ville Lumière)."
},
{
"role": "user",
"content": "Tell me more about its population."
}
]
}
Text with Images
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg",
"detail": "high"
}
}
]
}
]
}
Text with PDF Analysis
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please summarize this research paper"
},
{
"type": "url",
"url": "https://doompdf.pages.dev/doom.pdf",
"options": {
"select_pages": "1-5",
"include_page_screenshot_as_image": true
}
}
]
}
]
}
URL with Base64 Screenshot
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What does this webpage contain?"
},
{
"type": "url",
"url": "data:text/html;base64,SGVsbG8gV29ybGQ=",
"name": "Example Page",
"options": {
"include_page_screenshot_as_image": true
}
}
]
}
]
}
Metadata Filtering
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in pages 2-5 of the technical documentation?"
}
]
}
],
"filter_metadata": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"jurisdiction": "alabama"
}
},
{
"term": {
"jurisdiction": "florida"
}
},
{
"term": {
"jurisdiction": "nevada"
}
}
]
}
},
{
"range": {
"date": {
"gt": "2024-10-29"
}
}
}
]
}
}
}
Response
Non-streaming Response
{
id: string;
choices: Array<{
message: {
role: 'assistant';
content: string;
tool_calls?: Array<ToolCall>;
};
finish_reason: 'stop' | 'length' | 'tool_calls' | 'content_filter';
index: number;
}>;
usage: {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
prompt_embedding_tokens?: number;
};
contexts?: Array<RagContext>;
}
Streaming Response
Sends chunks of the response as Server-Sent Events (SSE) with the following format:
{
id: string;
choices: Array<{
delta: {
content?: string;
role?: 'assistant';
tool_calls?: Array<ToolCall>;
};
finish_reason: string | null;
index: number;
}>;
usage: null | UsageInfo;
contexts?: Array<RagContext> | null;
}
Error Handling
The API uses standard HTTP status codes with specific actions for resolution:
400 Bad Request
Cause: Invalid parameters or malformed request
Recommended resolution:
Check request body format and required fields
Validate parameter types and values
Ensure message array is not empty
Check if URLs are properly formatted and accessible
401 Unauthorized
Cause: Missing or invalid authentication
Recommended resolution:
Check if API key is included in the request header
Verify API key is valid and not expired
Ensure API key has correct format
Generate a new API key if necessary
Contact [email protected] if API key should be valid
403 Forbidden
Cause: Insufficient permissions for the requested operation
Recommended resolution:
Verify team membership and permissions
Check if you have access to the specified LLM application
Ensure your subscription covers the requested features
Request necessary permissions from team admin
Upgrade subscription tier if needed
429 Too Many Requests
Cause: Rate limit exceeded, see Rate Limiting section below for more details
Recommended resolution:
Implement exponential backoff retry logic
Check rate limits in response headers
Reduce request frequency
Consider upgrading your plan for higher limits
Optimize batch operations to reduce API calls
500 Internal Server Error
Cause: Server-side error
Recommended resolution:
Retry request after a brief delay
Check system status page for outages
Verify request payload size is within limits
Contact [email protected] if error persists
Save error response for troubleshooting
For all errors, the response will include a detailed error message to help diagnose the issue. If problems persist after taking the recommended actions, please contact [email protected] with the error details.
Rate Limiting
The API enforces the following rate limits. If any of these limits is reached, subsequent requests will be rejected with a 429 (Too Many Requests) status code until the limit resets:
Origin IP address-based: 1500 requests per 60 seconds
Deployment ID-based: 300 requests per 60 seconds
Team-based: Daily limits apply for free trial accounts
Note: These limits are evaluated independently - hitting any single limit will result in request rejection, regardless of the status of other limits.
Best Practices
Set appropriate
temperature
values in the sandbox:Lower (0.2) for factual responses
Higher (0.8) for creative responses
Enable
stream: true
for real-time responsesUse
rewrite_query
for optimized RAG queriesInclude relevant file and URL content for context-aware responses
Last updated