ML Assisted Labeling

ML Assisted Labeling extension enables you to call open source models, LLMs, or your own model to automatically return labels and help you labeling!

Introduction

Datasaur's ML Assisted Labeling extension enhances data labeling efficiency and accuracy for NLP projects. It integrates open-source models, Large Language Models (LLMs), and custom models, providing automatic labeling for Span-based, Row-based, Bounding Box and Document-based projects. This tool streamlines the data labeling process, automating your labeling workflow to save time and improve data quality.

Service provider for ML-assisted Labeling

Key Features

Labeling multiple labels at once: Datasaur ML Assisted allows you to label multiple items within a label set at once, eliminating the need to label each item individually. Streamline your span-based projects with ease.
Integration with Popular Models: Seamlessly integrates with widely-used models like SpaCy, NER, POS, and Sentiment Analysis. Additionally, it supports integration with various LLMs, Hugging Face, and other model platforms.
Time-Saving Efficiency: Save valuable time and resources by automating the labeling process with Datasaur ML Assisted Labeling. Focus on critical tasks while quickly reviewing the automated labels and making sure you deliver a good quality data.

Quick Start Guide

So how do you set up the ML Assisted Labeling extension? It's as simple as three steps:

Go to Manage Extension and Enable ML Assisted Labeling.
The ML Assisted Labeling Extension should appear on the right side.
Enabling ML Assisted Labeling
Select a service provider for assisted labeling.
Most of the providers don’t need any additional information.
Click Predict labels to apply labels to your document.

For the Row based project type, users have several additional options and features:

Select specific rows for prediction: Choose which rows to include in the prediction.
Target text: Pick which text to use as input for reference.
Target question: Decide which output column to predict.
Faster prediction speed: Toggle this option to run predictions faster via the backend.

Supported Model Provider

Row Labeling

Span Labeling

BBox Labeling

Doc Labeling

We categorize model providers into several key catagories to help better understand their capabilities and use cases.

Datasaur Hosted
- These models are hosted by Datasaur.
- They are ready-to-use models for common NLP tasks like Named Entity Recognition (NER), Sentiment Analysis, Part-of-Speech (POS) Tagging, and Dependency Parsing.
- Examples include Spacy, CoreNLP, and FewNERD, which provide pre-trained models for various NLP tasks.
Cloud Provider
- These models require access to external AI services hosted on various cloud providers.
- Users can leverage pretrained or fine-tuned models deployed on cloud platforms, providing scalability and access to large models.
- Examples include Hugging Face Inference API, Azure ML, Google Vertex AI, and Amazon SageMaker, where models can be hosted and queried via API.
LLM Assisted Labeling
- These models integrate directly with LLM providers and require the user to input an API key for access.
- They are typically used for generating labels, suggestions, or annotations to assist with labeling tasks.
- Examples include OpenAI (GPT models), Azure OpenAI, Anthropic (Claude models), Gemini (Google AI), Cohere, and custom models that can be connected via API.
LLM Labs
- These models are deployed through Datasaur’s LLM Labs, which provides access to 100+ LLM providers through an integrated endpoint.
- This allows users to switch between different models without manually configuring each provider separately.
Custom Model
- This option allows users to connect their own models via a custom REST API.
- The API must follow a specific request format to ensure compatibility with Datasaur.
- It offers flexibility for organizations that have internally trained models or want to use self-hosted LLMs.

Here is the type of model based on the providers:

Tree-diagram image of ML-assisted Labeling provider in Datasaur

Datasaur Hosted

Spacy, CoreNLP, SparkNLP, NLTK, Sentiment Analysis, FewNERD

Cloud Provider

Hugging Face, Azure ML, Google Vertex AI, Amazon SageMaker

LLM Assisted Labeling

OpenAI, Azure OpenAI, Anthropic, Gemini, Cohere, Custom

LLM Labs

100+ LLM provider

Custom Model

Depends on users’ internal API

Enforce ML Assisted Labeling Settings from Admin or Reviewer to Labeler

This feature ensures consistency by allowing Admins or Reviewers to enforce their ML Assisted Labeling settings for Labelers. When enabled, Labelers will follow the exact setup specified by the Admin or Reviewer, ensuring uniformity in results.

By clicking the three dots button next to the ML Assisted Labeling Extension, Admins or Reviewers can access the option to "Modify service provider settings." Selecting "All assignees" allows everyone in the project to modify their own ML Assisted Labeling settings. Choosing "Admin or reviewer only" enables the enforcement feature, restricting changes to the Admin or Reviewer.

Once activated, Labelers will not be able to switch to a different service provider. However, Admins or Reviewers retain the ability to modify the settings as needed.

If the "Admin or reviewer only" option is chosen in an ongoing project, please make sure the labeler refreshes their page to sync with the latest settings.

Image of Enabling Admin or Reviewer ML Assisted Labeling Settings to Labeler

Last updated 4 months ago