Knowledge base

What is a Knowledge base?

Knowledge base is a central repository where you can upload and manage files that you want to embed and utilize within LLM Labs platform. It is designed to store documents that can be used for various purposes, such as enhancing understanding and leveraging them in the Sandbox for application development.

Get started

You can visit the Knowledge base page by selecting the Knowledge base option located in the LLM Labs sidebar.

Knowledge base Creation

Click the Create new knowledge base button.
Enter your knowledge base name, and click the Create button.
Once the knowledge base is created, you will be redirected into the knowledge base. Here, you can upload your desired files into the knowledge base by clicking the Upload file button. The maximum file size to be uploaded to the knowledge base is 500MB.

You can also add files from external object storage. Learn more about adding files from external object storage.

After you select the files, click the Update button to initiate the embedding process. The embedding process might take some time, depending on the file size and the number of files.

You can upload more files in the Update knowledge base dialog by clicking the Upload more files button. You can also remove unwanted files by clicking the Delete button next to each file.

Once you’ve clicked the Update knowledge base button, you will need to configure the knowledge base setting. You will only be asked about this once for each knowledge base, and it will be saved for future embeddings. Don’t worry; you will be able to change the settings later, but existing files will be re-embedded.
The configurations are:
- Embedding model: Your preferred embedding models. Datasaur supports several embedding models by default from these providers:
  - OpenAI
    text-embedding-ada-002
    text-embedding-3-small
    text-embedding-3-large
    Text Embedding Ada 002
    Text Embedding 3 Small
    Text Embedding 3 Large
  - Amazon Bedrock
    amazon.titan-embed-text-v1
    amazon.titan-embed-image-v1
    amazon.titan-embed-text-v2:0
    cohere.embed-english-v3
    cohere.embed-multilingual-v3
  - Vertex AI
    textembedding-gecko@003
    text-embedding-004
    textembedding-gecko-multilingual@001
    text-multilingual-embedding-002
  - Chunk size: The maximum number of characters that a chunk can contain. The larger the numbers, the bigger each chunk will be, allowing more data to be included within it.
  - Overlap: The number of characters that should overlap between two adjacent chunks. The larger the overlap, the more information each chunk shares with its neighboring chunks.
  - Advanced settings: Additional settings can enhance your data organization by enabling you to provide information about the file using the File Properties feature.

Click the Save and update knowledge base button to save the settings.

After completing the embedding process, you can preview the files and use them to conduct Retrieval-Augmented Generation (RAG) in LLM Labs. In this example, we embed our sample Patient Records which will be used for the RAG process in LLM Labs.

Search

The search function allows you to validate the effectiveness of your knowledge base in providing context. The search results are shown in chunks that follow the chunk size and overlap value you specified. Each chunk will have a similarity score along with its source. A higher similarity score means the chunk content is more related to the given prompt.

Activity

The Activity feature logs all actions performed on your knowledge base, making it easier to track changes and actions. You can filter the activity based on member, file, file source, and date.

Add URLs

In LLM Labs, you can now add URLs directly to the Knowledge Base, expanding the sources of information beyond file uploads.

Open your knowledge base, click on the Upload Files button. In the dropdown menu, choose Add URLs.
A dialog box will appear where you can paste the desired URL.
Click + button to add the URLs to your knowledge base. Once you've added the URL, click the Update knowledge base button, the URLs will be automatically processed and indexed for search and retrieval within the project.

RAG Example: Healthcare Assistant

Here is how Knowledge base can streamline the development of a Retrieval-Augmented Generation (RAG) based Healthcare Assistant in LLM Labs:

Create the Sandbox with the User Instruction and System Instruction you've prepared.
From the Knowledge base dropdown, select the knowledge base you've created.
Write your prompt asking about a patient's health condition. The results from the knowledge base will then be displayed.
You can also view the corresponding chunks from the knowledge base and the source.

This is just one example! Knowledge base empowers you to build various LLM applications that rely on efficient retrieval of semantically related information.

Ready to Streamline Your Workflow?

Explore the LLM Labs documentation for detailed instructions based on your plan and functionalities. Contact us at [email protected] if you need further assistance, our support team is always happy to help!

Last updated 8 months ago