For the complete documentation index, see llms.txt. This page is also available as Markdown.

Datasaur Dinamic with Hugging Face

Introduction

Datasaur Dinamic with Hugging Face Auto Train helps you train and deploy models directly from labeled data in Datasaur. You can also use the trained model with ML-assisted labeling to generate predictions for unlabeled data.

This feature is available for both row labeling and span labeling projects.

Quick start guide

Here's a step-by-step guide to achieving optimal results with our Datasaur Dinamic using Hugging Face Auto Train provider:

  1. Create a custom project using labeled or unlabeled data.

  2. If the dataset is unlabeled, start by labeling part of the data first. For example, if the dataset has 20 rows, you can label 10–15 rows and automate the rest later.

  3. Enable the Datasaur Dinamic extension from the Manage extensions dialog.

    Datasaur Dinamic with Hugging Face

Prepare Hugging Face credentials

Before training the model, prepare the following credentials:

  • Your username or organization name in Hugging Face.

  • An API access token.

When creating the API token, make sure all required permissions are enabled. To use the trained model with ML-assisted labeling, enable Inference Endpoints.

Permission options in Hugging Face API Token

Inference Endpoints are currently available only for paid Hugging Face accounts and require a valid payment method.

Train the model

  1. Open the Datasaur Dinamic extension.

  2. Enter your Hugging Face username or organization name and API token.

  3. For row labeling projects, select the input columns used for training and the target question.

  4. Enable the Deploy dedicated inference endpoint option.

  5. Click Train to start training the model.

After training starts, wait for the model to finish deploying. You can also monitor the model and dataset from your Hugging Face profile.

Once deployment is complete, the extension displays the model URL in the Model name field.

Trained model name shown in Datasaur

Label remaining data with ML-assisted labeling

After the model is deployed, you can use the model in the ML-assisted labeling extension for labeling with the Hugging Face provider.

  1. Copy the model name in Model name field from the Datasaur Dinamic extension.

  2. Enable the ML-assisted labeling extension and select Hugging Face as your provider.

  3. Paste the model name to the Model name field in the ML-assisted labeling extension.

  4. Enter your API token and set the confidence score.

  5. Click Predict labels to generate labels for the remaining data.

This workflow helps reduce the amount of manual labeling needed for large datasets.

Last updated