# ML Assisted Labeling

## Introduction

Datasaur's **ML-assisted labeling** extension enhances data labeling efficiency and accuracy for NLP projects. It integrates open-source models, large language models (LLMs), and custom models, providing automatic labeling for span labeling, row labeling, bounding box labeling and document labeling projects. This tool streamlines the data labeling process, automating your labeling workflow to save time and improve data quality.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-390796b06b4d55bae049640ffae4fb64d48006d1%2FExtension%20-%20ML-assisted%20labeling%20-%20service%20provider%20highlights.png?alt=media" alt="Service provider for ML-assisted Labeling"><figcaption></figcaption></figure>

## Key Features

1. **Labeling multiple labels at once**: Datasaur **ML-assisted labeling** allows you to label multiple items within a label set at once, eliminating the need to label each item individually. Streamline your span-based projects with ease.
2. **Integration with Popular Models**: Seamlessly integrates with widely-used models like SpaCy, NER, POS, and Sentiment Analysis. Additionally, it supports integration with various LLMs, Hugging Face, and other model platforms.
3. **Time-Saving Efficiency:** Save valuable time and resources by automating the labeling process with Datasaur **ML-assisted labeling**. Focus on critical tasks while quickly reviewing the automated labels and making sure you deliver a good quality data.

## Quick Start Guide

So how do you set up the **ML-assisted labeling** extension? It's as simple as three steps:

1. Go to the **Manage extensions** dialog and enable the **ML-assisted labeling** extension.
2. The **ML-assisted labeling** extension should appear on the right side.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-09dc1df21aa4c147c6f0848588c813a6fae4ac71%2FExtension%20-%20ML-assisted%20Labeling%20-%20Span%20labeling%20-%20spaCy%20-%20highlight.png?alt=media" alt="Image of ML Assisted Labeling Menu"><figcaption><p>ML-assisted labeling extension with spaCy</p></figcaption></figure>
3. Select a service provider for assisted labeling.
4. Most of the providers don’t need any additional information.
5. Click **Predict labels** to apply labels to your document.

{% hint style="info" %}
For row labeling projects, users have several additional options and features:

1. Select specific rows for prediction: Choose which rows to include in the prediction.
2. Target text: Pick which text to use as input for reference.
3. Target question: Decide which output column to predict.
4. Faster prediction speed: Toggle this option to run predictions faster via the backend.
   {% endhint %}

## Supported Model Provider

<table><thead><tr><th width="198">Row Labeling</th><th width="211">Span Labeling</th><th width="162">BBox Labeling</th><th>Doc Labeling</th></tr></thead><tbody><tr><td><a href="ml-assisted-labeling/sentiment-analysis">Sentiment Analysis</a></td><td><a href="ml-assisted-labeling/spacy">SpaCy</a></td><td><a href="ml-assisted-using-custom-api#custom-api-for-bounding-box-labeling">Custom Model</a></td><td><a href="ml-assisted-labeling/llm-labs-beta">Datasaur LLM Labs</a></td></tr><tr><td><a href="ml-assisted-labeling/llm-assisted-labeling">LLM Assisted Labeling</a></td><td><a href="ml-assisted-labeling/llm-assisted-labeling">LLM Assisted Labeling</a></td><td></td><td></td></tr><tr><td><a href="ml-assisted-labeling/amazon-comprehend">Amazon Comprehend</a></td><td><a href="ml-assisted-labeling/nltk">NLTK</a></td><td></td><td></td></tr><tr><td><a href="ml-assisted-labeling/google-vertex-ai">Google Vertex AI</a></td><td><a href="ml-assisted-labeling/corenlp-ner">CoreNLP</a> and <a href="ml-assisted-labeling/sparknlp-ner">SparkNLP</a> NER</td><td></td><td></td></tr><tr><td><a href="ml-assisted-labeling/amazon-sagemaker">Amazon SageMaker</a></td><td><a href="ml-assisted-labeling/corenlp-pos">CoreNLP</a> and <a href="ml-assisted-labeling/sparknlp-pos">SparkNLP</a> POS</td><td></td><td></td></tr><tr><td><a href="ml-assisted-labeling/llm-labs-beta">Datasaur LLM Labs</a></td><td><a href="ml-assisted-labeling/llm-labs-beta">Datasaur LLM Labs</a></td><td></td><td></td></tr><tr><td><a href="ml-assisted-labeling/azure">Azure</a></td><td><a href="ml-assisted-labeling/fewnerd">FewNERD</a></td><td></td><td></td></tr><tr><td><a href="ml-assisted-labeling/ml-assisted-using-huggingface">Hugging Face</a></td><td><a href="ml-assisted-labeling/ml-assisted-using-huggingface">Hugging Face</a></td><td></td><td></td></tr><tr><td><a href="ml-assisted-using-custom-api#custom-api-for-row-based">Custom Model</a></td><td><a href="ml-assisted-using-custom-api#custom-api-for-span-based">Custom Model</a></td><td></td><td></td></tr></tbody></table>

We categorize model providers into several key catagories to help better understand their capabilities and use cases.

* Datasaur Hosted
  * These models are **hosted by Datasaur.**
  * They are ready-to-use models for common NLP tasks like Named Entity Recognition (NER), Sentiment Analysis, Part-of-Speech (POS) Tagging, and Dependency Parsing.
  * Examples include Spacy, CoreNLP, and FewNERD, which provide pre-trained models for various NLP tasks.
* Cloud Provider
  * These models require access to **external AI services** hosted on various cloud providers.
  * Users can leverage pretrained or fine-tuned models deployed on cloud platforms, providing scalability and access to large models.
  * Examples include Hugging Face Inference API, Azure ML, Google Vertex AI, and Amazon SageMaker, where models can be hosted and queried via API.
* LLM Assisted Labeling
  * These models **integrate directly with LLM providers** and require the user to input an **API key** for access.
  * They are typically used for generating labels, suggestions, or annotations to assist with labeling tasks.
  * Examples include OpenAI (GPT models), Azure OpenAI, Anthropic (Claude models), Gemini (Google AI), Cohere, and custom models that can be connected via API.
* LLM Labs
  * These models are deployed through **Datasaur’s LLM Labs**, which provides access to **100+ LLM providers** through an integrated endpoint.
  * This allows users to switch between different models without manually configuring each provider separately.
* Custom Model
  * This option allows users to connect their own models via a **custom REST API**.
  * The API must follow a specific request format to ensure compatibility with Datasaur.
  * It offers flexibility for organizations that have internally trained models or want to use self-hosted LLMs.

Here is the type of model based on the providers:

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-4d01c5e754f0843505e0675e4e7fd75ec0978d74%2FExtension%20-%20ML-assisted%20labeling%20diagram.png?alt=media" alt="Tree-diagram image of ML-assisted Labeling provider in Datasaur"><figcaption><p>ML-assisted Labeling provider in Datasaur</p></figcaption></figure>

| Datasaur Hosted       | Spacy, CoreNLP, SparkNLP, NLTK, Sentiment Analysis, FewNERD |
| --------------------- | ----------------------------------------------------------- |
| Cloud Provider        | Hugging Face, Azure ML, Google Vertex AI, Amazon SageMaker  |
| LLM Assisted Labeling | OpenAI, Azure OpenAI, Anthropic, Gemini, Cohere, Custom     |
| LLM Labs              | 100+ LLM provider                                           |
| Custom Model          | Depends on users’ internal API                              |

### Enforce ML Assisted Labeling Settings from Admin or Reviewer to Labeler

This feature ensures consistency by allowing admins or reviewers to enforce their **ML-assisted labeling** settings for labelers. When enabled, labelers will follow the exact setup specified by the admin or reviewer, ensuring uniformity in results.

By clicking the three dots button next to the **ML-assisted labeling** header, admins or reviewers can adjust the option to **Modify service provider settings**. Selecting **All assignees** allows everyone in the project to modify their own **ML-assisted labeling** settings. Choosing **Admin or reviewer only** enables the enforcement feature, restricting changes to the admin or reviewer.

Once activated, labelers will not be able to switch to a different service provider. However, admins or reviewers retain the ability to modify the settings as needed.

{% hint style="info" %}
If the **Admin or reviewer only** option is chosen in an ongoing project, please make sure the labeler refreshes their page to sync with the latest settings.
{% endhint %}

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-e0607090ba35916104947ee126f855afac096ad3%2FExtension%20-%20ML-assisted%20Labeling%20-%20more%20menu%20(reviewer%20mode).png?alt=media" alt="Image of Enabling Admin or Reviewer ML Assisted Labeling Settings to Labeler"><figcaption><p>Limiting labelers’ ability to modify the service provider in <strong>ML-assisted labeling</strong></p></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/assisted-labeling/ml-assisted-labeling.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
