Pre-Labeled Project

Pre-labeled projects allow you to create a new labeling project using a file that already contains labeled data. This helps you start faster since you don’t have to label everything from scratch.

Use cases

It is especially useful in the following scenarios:

  • Streamlined onboarding: When starting a new project that shares the same labeling schema as previous one, you can use a pre-labeled file to quickly set up the new project.

  • Consistency in labeling: When you need to apply a standard set of labels across multiple projects, pre-labeled projects help ensure uniformity.

  • Efficiency: Save time by using pre-defined labels for projects with known labeling requirements.

  • Data preparation: Import data with preliminary labels from external sources directly into Datasaur.

How to create pre-labeled projects

Span labeling

  1. Prepare the pre-labeled file. Supported formats include:

  2. Go to Projects page and click Create project.

  3. In step 1, upload the pre-labeled file.

  4. Complete the remaining project setup steps, and your span labeling project will be ready with pre-labeled data.

Row labeling

  1. Prepare the pre-labeled file. Supported formats include: CSV, JSON Tabular, TSV, XLS and XLSX, JSON Lines, Datasaur Schema (.json)

    • Include a column in the file for the answers to the questions you will configure in step 3 of project creation.

  2. Go to Projects page and click Create project.

  3. In step 1, upload the pre-labeled file.

  4. In step 3, create or upload the question set. Then link each answer column to its question by using Refer answer to table column in More settings accordion.

  5. Complete the remaining project setup steps, and your row labeling project will be ready with pre-labeled data.

Document labeling

  1. Prepare your documents and pre-labeled answer files. Pre-labeling works as long as the media files are uploaded together with their answer files.

    • Notes:

      • Answer file: A JSON file that contains the answers to your question set. The filename must end with .answer.json. Example:

      • File naming: Each document and its answer file must have the same filename. For example, if the document is named a.jpg, its answer file should be named a.answer.json.

      • Multiple documents: If you have multiple documents and multiple answer files, prepare them like this:

        • a.jpg, a.answer.json

        • b.jpg, b.answer.json

        • c.jpg, c.answer.json

  2. Go to Projects page and click Create project.

  3. In step 1, upload the documents along with their answer files.

  4. In step 3, create or upload the question set. Ensure that the answers in the pre-labeled answer files are configured in this step.

  5. Complete the remaining project setup steps, and your document labeling project will be ready with pre-labeled data.

Bounding box labeling

  1. Prepare your documents and pre-labeled answer files. Pre-labeling works as long as the media files are uploaded together with their answer files.

    • Answer file format: Supported file formats include: YOLO (.txt), LabelMe (.xml), Pascal VOC (.xml), Datasaur Schema (.json)

    • File naming: Each document and its answer file must have the same filename. For example, if the document file is named a.jpg, its answer file in YOLO format should be named a.txt.

    • Multiple documents: If you have multiple documents and multiple answer files, prepare them like this:

      • a.jpg, a.txt

      • b.jpg, b.txt

      • c.jpg, c.txt

  2. Go to Projects page and click Create project.

  3. In step 1, upload the documents along with their answer files.

    • Notes:

      • Labels in your pre-labeled file automatically match the labels in your label set.

      • Extra labels in the file that are not found in the label set will be added as new label classes.

  4. In step 3, create or upload the label set.

  5. Complete the remaining project setup steps, and your bounding box labeling labeling project will be ready with pre-labeled data.

Pre-labeled propagation

By default, pre-labeled data is treated as the labeler’s submission. Once applied in labeler mode, it instantly appears in reviewer mode. Depending on the project’s consensus settings, it may appear as as either accepted or conflicted labels.

This behavior helps speed up labeling when the pre-labeled data is reliable. If the data is unreliable, turn on the Require approval for pre-labeled labels setting to ensure labelers review each label before it is applied.

Require approval for pre-labeled labels/answers

This project setting adds a quality control step for pre-labeled data. It is useful when the pre-labeled source is less reliable or when verification is required before the data can be accepted.

Instead of automatically accepting all pre-labeled labels or answers, this option requires labelers to manually accept or reject each one before it moves to the review stage. This ensures that only verified data is used.

This setting is useful in the following scenarios:

  • Untrusted external data When importing labels from external sources, this setting prevents low-quality or inconsistent labels from automatically appearing in reviewer mode before verification.

  • Initial quality verification Allows labelers to validate pre-labeled data before continuing with more complex labeling tasks, ensuring a reliable baseline.

  • Assisted labeling Allows labelers to treat pre-labeled labels or answers as suggestions, approving or rejecting them based on project requirements.

Differences from the default behavior

Feature
Default
Require approval

Propagation

Automatic. Pre-labeled data is treated as the labeler’s submission and appears in reviewer mode.

Manual approval required. Pre-labeled data remains in the labeler’s copy and does not appear in reviewer mode until the labeler accepts it.

Labeler action

No action required. Labelers can modify or delete pre-labeled data as needed.

Manual review required. Labelers must accept or reject each pre-labeled label and answer.

State in reviewer mode

Immediately visible, with color depending on consensus rules.

Not visible in reviewer mode. They appear only after the labeler accepts them.

You can configure this setting in the step 5 of project creation, by enabling the Require approval for pre-labeled labels toggle. This setting can only be defined during project creation and cannot be changed afterward.

circle-info

This setting is currently available for the following project types:

  • Span labeling.

  • Row labeling.

Support for additional labeling types will be added in future releases.

Hovering on a pre-labeled span label requiring approval
Hovering on a pre-labeled row answer requiring approval

Labeler workflow and outcomes

When this setting is enabled, pre-labeled labels and answers appear in each labeler’s view with dashed outlines. Labelers must review them before proceeding.

Accept

When accepted, the pre-labeled data becomes the labeler’s submission. It is marked as accepted and appears in reviewer mode.

Labelers can accept pre-labeled labels or answers in the following ways:

  • Span labeling

    • Right-click a pre-labeled label and select Accept.

    • Apply the same label to the same span.

    • Use the Labels extension to:

      • Accept a single pre-labeled label.

      • Accept multiple pre-labeled labels using bulk actions.

  • Row labeling

    • Select the corresponding answer in the Row labeling extension and submit.

Reject

When rejected, the pre-labeled data is removed from the labeler’s view and does not appear in reviewer mode.

Labelers can reject pre-labeled labels or answers in the following ways:

  • Span labeling

    • Right-click a pre-labeled label and select Reject.

    • Use the Labels extension to:

      • Reject a single pre-labeled label.

      • Reject multiple pre-labeled labels using bulk actions.

  • Row labeling

    • Leave the answer unselected and submit.

This workflow ensures that all pre-labeled data is reviewed before reaching the review stage, improving accuracy and data quality.

Last updated