Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
      • Dropbox
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • What is a Knowledge base?
  • Get started
  • Add URLs
  • RAG Example: Healthcare Assistant
  1. LLM Projects

Knowledge base

Last updated 5 months ago

What is a Knowledge base?

Knowledge base is a central repository where you can upload and manage files that you want to embed and utilize within LLM Labs platform. It is designed to store documents that can be used for various purposes, such as enhancing understanding and leveraging them in the Sandbox for application development.

Get started

You can visit the Knowledge base page by selecting the Knowledge base option located in the LLM Labs sidebar.

Knowledge base Creation

  1. Click the Create new knowledge base button.

  2. Enter your knowledge base name, and click the Create button.

  3. Once the knowledge base is created, you will be redirected into the knowledge base. Here, you can upload your desired files into the knowledge base by clicking the Upload file button. The maximum file size to be uploaded to the knowledge base is 500MB.

  1. After you select the files, click the Update button to initiate the embedding process. The embedding process might take some time, depending on the file size and the number of files.

You can upload more files in the Update knowledge base dialog by clicking the Upload more files button. You can also remove unwanted files by clicking the Delete button next to each file.

  1. Once you’ve clicked the Update knowledge base button, you will need to configure the knowledge base setting. You will only be asked about this once for each knowledge base, and it will be saved for future embeddings. Don’t worry; you will be able to change the settings later, but existing files will be re-embedded.

    The configurations are:

    • Embedding model: Your preferred embedding models. Datasaur supports several embedding models by default from these providers:

      • OpenAI

        • text-embedding-ada-002

        • text-embedding-3-small

        • text-embedding-3-large

        • Text Embedding Ada 002

        • Text Embedding 3 Small

        • Text Embedding 3 Large

      • Amazon Bedrock

        • amazon.titan-embed-text-v1

        • amazon.titan-embed-image-v1

        • amazon.titan-embed-text-v2:0

        • cohere.embed-english-v3

        • cohere.embed-multilingual-v3

      • Vertex AI

        • textembedding-gecko@003

        • text-embedding-004

        • textembedding-gecko-multilingual@001

        • text-multilingual-embedding-002

      • Chunk size: The maximum number of characters that a chunk can contain. The larger the numbers, the bigger each chunk will be, allowing more data to be included within it.

      • Overlap: The number of characters that should overlap between two adjacent chunks. The larger the overlap, the more information each chunk shares with its neighboring chunks.

Click the Save and update knowledge base button to save the settings.

  1. After completing the embedding process, you can preview the files and use them to conduct Retrieval-Augmented Generation (RAG) in LLM Labs. In this example, we embed our sample Patient Records which will be used for the RAG process in LLM Labs.

Search

The search function allows you to validate the effectiveness of your knowledge base in providing context. The search results are shown in chunks that follow the chunk size and overlap value you specified. Each chunk will have a similarity score along with its source. A higher similarity score means the chunk content is more related to the given prompt.

Activity

The Activity feature logs all actions performed on your knowledge base, making it easier to track changes and actions. You can filter the activity based on member, file, file source, and date.

Add URLs

In LLM Labs, you can now add URLs directly to the Knowledge Base, expanding the sources of information beyond file uploads.

  1. Open your knowledge base, click on the Upload Files button. In the dropdown menu, choose Add URLs.

  2. A dialog box will appear where you can paste the desired URL.

  3. Click + button to add the URLs to your knowledge base. Once you've added the URL, click the Update knowledge base button, the URLs will be automatically processed and indexed for search and retrieval within the project.

RAG Example: Healthcare Assistant

Here is how Knowledge base can streamline the development of a Retrieval-Augmented Generation (RAG) based Healthcare Assistant in LLM Labs:

  1. From the Knowledge base dropdown, select the knowledge base you've created.

  2. Write your prompt asking about a patient's health condition. The results from the knowledge base will then be displayed.

  3. You can also view the corresponding chunks from the knowledge base and the source.

This is just one example! Knowledge base empowers you to build various LLM applications that rely on efficient retrieval of semantically related information.

Ready to Streamline Your Workflow?

You can also add files from external object storage. .

Advanced settings: Additional settings can enhance your data organization by enabling you to provide information about the file using the feature.

Create the with the User Instruction and System Instruction you've prepared.

Explore the LLM Labs documentation for detailed instructions based on your plan and functionalities. Contact us at if you need further assistance, our support team is always happy to help!

Learn more about adding files from external object storage
File Properties
Sandbox
support@datasaur.ai