Label Sets / Question Sets

Span labeling

For span labeling, a label set is a single-column .csv file that follows the structure below.

Column 1

Label 1

Label 2

Label 3

etc...

NER label set

Color-coded labels

We provide twelve default colors that you can configure during project creation or in the Labels extension.

Label color palette

You can also create a .csv label set file with your desired label colors using HTML color codes. In the sample below, label,color is the header, indicating two columns. This will always be the first row in the .csv file.

Colored label set
circle-info

Colored label sets are only supported in the .csv format.

Limit selection to bottom-level labels only

In hierarchical label sets, some labels act as broad categories while others are more specific. This setting restricts selection to bottom-level labels—those without child labels—to improve precision and consistency.

When to use

Use this setting when your project requires detailed and specific classification. Some examples:

  • Fruit

    • Apple

    • Banana

  • Vegetable

    • Carrot

    • Spinach

When enabled, labelers can select Apple, Banana, Carrot, and Spinach, but not Fruit or Vegetable.

How to configure

You can enable this setting during project creation, in an existing project, or in the Label Management page.

During project creation

  1. Create new project.

  2. Go to step 3 and select Span labeling.

  3. In the Label set section, click the triple-dot menu on the label set and enable the setting.

Within a project

  1. Open the Labels extension and select a label set.

  2. Click the triple-dot menu and choose one of the following:

    1. Add new label set

    2. Replace existing label set

    3. Edit label set

  3. Expand the Label set settings accordion and enable the setting.

Label management page

  1. Go to the Label management page.

  2. Select Add label set or click Edit icon on an existing label set.

  3. Expand the Label set settings accordion and enable the setting.

Labeling behavior

When this setting is enabled:

  • Only bottom-level labels can be selected.

  • Parent labels with child labels are not selectable.

  • Keyboard shortcuts (numbers, arrow keys, and Enter) apply only to bottom-level labels.

Bounding box labeling

Label sets

You can use .csv, .tsv, or .json formats for bounding box label sets.

  • For .csv and .tsv files

    • We support color names (example: red), hex values (example: #00FF00), or RGB (example: rgb(0,0,255)).

    • You can also create a label set with only label names, as shown in the Datasaur sample - Bbox only name.csv file below.

    • Other properties, such as captionAllowed and captionRequired, will use default values if not specified.

  • For .json files

    • We support hex values and RGB formats only.

Text caption

The Allow text caption setting allows labelers to add a caption to a bounding box. If disabled, captions cannot be added.

You can require text for specific labels by enabling the Require caption option.

Row/Document labeling

For row labeling and document labeling projects, a question set is a .csv file with questions in the first column and answers in the subsequent columns.

Column 1
Column 2
Column 3
Column 4.

Question 1

Answer 1

Answer 2

Answer 3

Question 2

Answer 1

Answer 2

Question 3

Answer 1

Answer 2

Answer 3

Answer 4

Answer 5

Question set that only contain dropdown question type

You can also use a .json file to create a question set with multiple question types.

Complex question set

Question hint

You can optionally add a hint to each question. Hints can include instructions or explanations to help labelers provide more accurate answers. Each hint can contain up to 65,000 characters.

You can configure question hints during project creation or in the Label management page.

Question hint in project creation
Question hint in the label management page

Supported format

The Document labeling and Row labeling extensions support Markdown syntax in question hints. his allows you to format text, create lists, add links, or emphasize content using standard Markdown.

Formatting
Syntax

Bold

your text

Italic

your text

Underline

<u>your text</u>

• Bullet

dashes (-) or asterisks (*)

1. Numbering

1., 2., 3., etc.

[your text] (https://example.com)

circle-info

Markdown syntax counts toward the character limit (65,000 characters).

Best practices

Keep hints brief and focused on relevant information. Long hints may appear as large text blocks and clutter the UI. For more complex content, consider including links that labelers can open when needed.

Hint with markdown syntax
Rendered markdown in the extension

Question types

Here are the available question types for question sets.

1. Text field

Text field allows labelers to answer a question by entering a single line of free-form text.

You can add validation in the More settings accordion.

2. Text area

Text area allows labelers to answer a question by entering multi-line free-form text.

3. Dropdown

Dropdown requires labelers to answer a question by selecting one answer from a list.

You can upload a .csv file as an answer set for the dropdown options.

You can also allow multiple selections by enabling Allow multiple answers.

4. Hierarchical dropdown

Hierarchical dropdown allows labelers to answer a question by selecting from structures, multi-level options.

You can upload a hierarchical answer set after creating the question with this format.

5. True/False

circle-info

Previously called Yes/No.

True/False allows labelers to answer a question by checking the checkbox. You can also add a hint.

6. Single choice

circle-info

Previously called Radio button.

Single choice allows labelers to answer a question by selecting one answer from up to 25 options.

You can also add a hint.

7. Multiple choice

Multiple choice allows labelers to answer a question by selecting several answers from up to 25 options.

8. Date

Date allows labelers to answer a question by selecting a date from a calendar, ensuring valid date input. Labelers can still type it manually if needed.

You can set the current date as the default value by enabling the Use current date as default value setting in step 3 of project creation.

9. Time

Time allows labelers to answer a question by selecting a time from a clock, ensuring valid time input. Labelers can still type it manually if needed.

You can set the current time as the default value by enabling the Use current time as default value setting in step 3 of project creation.

10. Slider

Slider allows labelers to answer a question by selecting a numeric value within a defined range (example: from 1 to 10).

You can hide the value from labelers if needed (it remains visible in reviewer mode).

You can customize the slider color using predefined options or custom values (hex, color name, or RGBarrow-up-right).

You can also use the interactive preview to test the slider.

When you change the minimum or maximum value of a slider question, existing answers outside the new range will trigger an error message in the Row labeling or Document labeling extension when users select the affected row or document.

11. URL

URL allows labelers to input links with validation.

12. Grouped attributes

Grouped attributes allows multiple related questions to be grouped together.

13. Script-generated questions

circle-info

Available only for Row labeling projects.

Script-generated questions dynamically generates questions for each row based on its data, allowing more flexible and dynamic workflows. For more details, see this page here.

Advanced settings

In row labeling projects, you can use the Refer answer to table column option in Advanced settings to pre-fill answers based on your dataset. See the full guide for more details.

Answer validation script

circle-info

This feature is available only in row labeling projects and is disabled by default. Please reach out to [email protected]envelope if your team needs this feature.

The answer validation script is a TypeScript-based feature that validates answers in row labeling tasks. It allows you to define custom validation logic for complex scenarios, such as validating answers using other questions, comparing data across questions, or calling external APIs. If a submission fails validation, an error message is shown.

This feature improves control, accuracy, and consistency in the labeling process.

Key capabilities

  1. Row-specific validation: Validates data based on the current row.

  2. Cross-question validation: Compares answers across multiple questions.

  3. API-based validation: Use external APIs or external business logic for validation.

Configure the validation script

circle-info

Validation can only be configured after questions are set up. Only admins can access this feature in reviewer mode.

  1. Go to the Row labeling extension in a project.

  2. Click on the three-dot menu.

  3. Select Configure answer validation script.

When you open the Configure answer validation script dialog for the first time, a template is provided.

chevron-rightView Templatehashtag

To provide feedback when a submission fails, return an object with an errorMessage property (optional). When the submission passes, return an empty object.

When validating, you may need to access certain information to determine whether the answer is valid or needs adjustment before submission. You can access all required data from the function argument, as shown in the example function.

  • columns: TableColumn[] contains information about the column structure of the data being labeled.

chevron-rightView structurehashtag
  • row: Cell[] is an array of cells containing the data that is being labeled.

chevron-rightView structurehashtag
  • questions: Question[] is an array of questions in the project.

chevron-rightView structurehashtag
  • answers is an object where each key is a question ID and each value is its corresponding answer. Depending on the question type, the value can take one of these four formats:

    multiple: false
    multiple: true

    normal question

    string

    string[]

    grouped attributes

    Answer

    Answer[]

chevron-rightView structurehashtag

We provide helper functions in the template to simplify common data access tasks, such as:

  • function getCellValueByColumnLabel(label: string): string;

    This function returns the value of a cell based on the column label.

  • function getAnswerByQuestionLabel(label: string, searchQuestions: Question[] = questions): Answers;

    This function returns the answer based on the question label.

Validating answer through an API call

You can include API requests in your validation logic using the Fetch APIarrow-up-right, enabling dynamic or third-party validations.

circle-exclamation

Examples

chevron-rightValidating answers between two questionshashtag
chevron-rightValidating an answer based on a cell valuehashtag
chevron-rightValidating an answer through an API request using the Fetch APIhashtag

FAQs

  1. Can I validate across multiple rows?

    No, the validation script is row-specific and operates on each row individually.

  2. What happens if there is an error in the script?

    Unhandled errors or exceptions will trigger a validation error and prevent submission. You can handle errors within the script to allow submission if needed.

Hierarchical label sets or dropdown options

You can upload multi-level hierarchical label sets for span labeling projects, and hierarchical dropdown options for row labeling or document labeling projects.

The following example shows a supported .csv format:

File structure

1. Header

The header id,label will always be the first row in the .csv file. The first label or option should have 1 as the ID, just like in the example above.

2. ID format

IDs follow a hierarchical numbering format similar to Microsoft Word:

  • Novel is the root level with ID 1.

  • Author is a second-level category under Novel with ID 1.1.

  • Name is a third-level item under Author with ID 1.1.1

circle-info

Important notes

When importing data, the CSV format uses dots (.) to represent hierarchical relationships. However, these dots are automatically converted into a different ID structure in the JSON format because dots are reserved for path traversal operations in the system. This means dots must not be used in JSON IDs. Here's how it works:

  • CSV Input:

  • Will be converted to JSON as:

    In JSON format:

    • ✅ Correct: "id": "2"

    • ❌ Incorrect: "id": "1.1"

    Using dots for JSON IDs will cause incorrect path resolution when selecting items in the hierarchy.

3. Hierarchical label set in span labeling projects

The hierarchy will be visible in the Labels extension and the label box. You can also use the same label name under different parent labels.

In the example above, although Java appears twice, each instance belongs to a different parent, making it contextually unique.

Using the same label name more than once under the same parent is not allowed. In the example below, the system flags an error because both Apple entries are under Fruit.

4. Hierarchical dropdown options in row or document labeling projects

Choose Hierarchical dropdown as the question type when creating the project. The hierarchy will be displayed in the Row labeling and Document labeling extension and the answer column in the table.

circle-info

In the dropdown menu:

  • Clicking the Home icon navigates to the top-level label.

  • You can search for bottom-level options globally.

Last updated