# Knowledge base

### What is a Knowledge base?

Knowledge base is a central repository where you can upload and manage files that you want to embed and utilize within LLM Labs platform. It is designed to store documents that can be used for various purposes, such as enhancing understanding and leveraging them in the Sandbox for model development.

### Get started

You can visit the Knowledge base page by selecting the **Knowledge base** option located in the LLM Labs sidebar.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-3622f17b50ad59064ef9e436ef16c69274af4bdb%2FKnowledge%20base%20-%20main%20page%20-%20sidebar.png?alt=media" alt=""><figcaption></figcaption></figure>

#### Knowledge base Creation

1. Click the **Create new knowledge base** button.
2. Enter your knowledge base name, then click **Create**.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-c809165f5b3d1ba0e7b580643bc5898c17c93d35%2FKnowledge%20base%20-%20main%20page%20-%20create%20knowledge%20base.png?alt=media" alt=""><figcaption></figcaption></figure>
3. Once the knowledge base is created, you will be redirected into the knowledge base. Here, you can upload your desired files into the knowledge base by clicking the **Upload file** button. The maximum file size to be uploaded to the knowledge base is 500MB.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-47634e313df05b97cab849a9cbd566f6fd69d78d%2FKnowledge%20base%20-%20Newly%20added%20page.png?alt=media" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
You can also add files from external object storage. [Learn more about adding files from external object storage](https://docs.datasaur.ai/llm-projects/vector-store/external-object-storage).
{% endhint %}

4. After you select the files, you will need to configure the global knowledge base configuration. You will only be asked about this once for each knowledge base, and it will be saved for future embeddings. Don’t worry, you can change the settings later, either globally or individually.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-769a292679e7d958f8ada37e5463872d5d81de76%2FKnowledge%20base%20-%20global%20configuration%20-%20first%20time.png?alt=media" alt=""><figcaption></figcaption></figure>

   The configurations are:

   * **Embedding model**: Your preferred embedding models. Datasaur supports several embedding models by default from these providers:
     * **OpenAI**
       * text-embedding-ada-002
       * text-embedding-3-small
       * text-embedding-3-large
       * Text Embedding Ada 002
       * Text Embedding 3 Small
       * Text Embedding 3 Large
     * **Amazon Bedrock**
       * amazon.titan-embed-text-v1
       * amazon.titan-embed-image-v1
       * amazon.titan-embed-text-v2:0
       * cohere.embed-english-v3
       * cohere.embed-multilingual-v3
     * **Vertex AI**
       * textembedding-gecko\@003
       * text-embedding-004
       * textembedding-gecko-multilingual\@001
       * text-multilingual-embedding-002
     * **Chunk size**: The maximum number of characters that a chunk can contain. The larger the numbers, the bigger each chunk will be, allowing more data to be included within it.
     * **Overlap**: The number of characters that should overlap between two adjacent chunks. The larger the overlap, the more information each chunk shares with its neighboring chunks.
     * **Advanced settings**: Additional settings can enhance your data organization by enabling you to provide information about the file using the [File Properties](https://docs.datasaur.ai/llm-projects/vector-store/file-properties) feature.
5. You can either process the files immediately by clicking **Process files**, or preview the chunking results and make modifications by clicking **Preview and edit.**
6. After completing the embedding process, you can preview the files and use them to conduct Retrieval-Augmented Generation (RAG) in LLM Labs.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-64f8b75788614c014081c65aa5605bf687213d33%2FKnowledge%20base%20-%20File%20processed.png?alt=media" alt=""><figcaption></figcaption></figure>

#### Search

The search function allows you to validate the effectiveness of your knowledge base in providing context. The search results are shown in chunks that follow the chunk size and overlap value you specified. Each chunk will have a similarity score along with its source. A higher similarity score means the chunk content is more related to the given query.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-264697c2ed22eb7f4d3b45d27ef3eaa60100ad2f%2FKnowledge%20base%20-%20Search.png?alt=media" alt=""><figcaption></figcaption></figure>

#### Activity

The Activity tab logs all actions performed on your knowledge base, making it easier to track changes and actions. You can filter activity by member, source, file, or date.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-083888bf86956c0db889c77ea1468ffa5cdf6857%2FKnowledge%20base%20-%20Activity.png?alt=media" alt=""><figcaption></figcaption></figure>

### Add URLs

You can also add URLs to the knowledge base, expanding the sources of information beyond file uploads.

1. Open your knowledge base, click on the more menu in the **Upload files** button, and select **Add URLs**.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-18bd6b8422ac02316b4df1303811fce01f8c2caf%2FKnowledge%20base%20-%20Add%20files%20more%20menu.png?alt=media" alt=""><figcaption></figcaption></figure>
2. A dialog box will appear where you can type your URLs.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-df7bb0946fa337583f98fc90283478102e825f2a%2FKnowledge%20base%20-%20Add%20URLs%20-%20URL%20typed.png?alt=media" alt=""><figcaption></figcaption></figure>
3. Click the **+** button or press **Enter** to add URLs to the list. Once you're done, click **Update knowledge base**.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-6aa5d9026c143478cb3c9ba999806fbb0be8f022%2FKnowledge%20base%20-%20Add%20URLs%20-%20URL%20added.png?alt=media" alt=""><figcaption></figcaption></figure>
4. You can either process the URLs immediately by clicking **Process files**, or preview the chunking results and make modifications by clicking **Preview and edit.**

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-0e28e1ca4641c9b0ecdaf73a33e4ee374a30f003%2FKnowledge%20base%20-%20Add%20URLs%20-%20URL%20added%20-%20update%20confirmation.png?alt=media" alt=""><figcaption></figcaption></figure>

### **RAG Example: Healthcare Assistant**

Here is how Knowledge base can streamline the development of a Retrieval-Augmented Generation (RAG) based Healthcare Assistant in LLM Labs:

1. Create a [Sandbox](https://docs.datasaur.ai/llm-projects/sandbox).
2. Navigate to a model, and select the knowledge base you've created.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-083450e69e99af50c27087805da694a1bbdb52ba%2FSandbox%20-%20Knowledge%20base%20-%20highlight%20dropdown.png?alt=media" alt=""><figcaption></figcaption></figure>
3. Write a prompt related to the knowledge base' content, then click **Run selected**. The results will be generated using the knowledge base content.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-98924c651c081dd4f1d6f740c63cbd19aba8788b%2FSandbox%20-%20Knowledge%20base%20-%20Run.png?alt=media" alt=""><figcaption></figcaption></figure>
4. Below the generated completion, you can view the corresponding chunks. These are the parts of your knowledge base used to create the response.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-a4bff67e5d803a2887fffa40d3866b8d8fb44778%2FSandbox%20-%20Knowledge%20base%20-%20Run%20-%20Corresponding%20chunks.png?alt=media" alt=""><figcaption></figcaption></figure>
5. You can also view the source file for the corresponding chunks by clicking the source link.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-0ce5bce0c299993e2f33ce7ad276de7e5aeb5aaa%2FSandbox%20-%20Knowledge%20base%20-%20Run%20-%20File%20preview.png?alt=media" alt=""><figcaption></figcaption></figure>

**This is just one example**! Knowledge base empowers you to build various models that rely on efficient retrieval of semantically related information.

**Ready to streamline your workflow?**

Explore the LLM Labs documentation for detailed instructions based on your plan and functionalities. Contact us at <support@datasaur.ai> if you need further assistance, our support team is always happy to help!
