Label Sets / Question Sets

Span-based Labeling

For span-based labeling, a label set is a single-column .csv or .tsv following the structure below:

We provide twelve colors you can configure manually from the Labels extension. You can also create a label set with your desired label colors in it. A sample file is provided below. Note: we do support any HTML color codes (as seen below).

  • Note: label,color is the header. This will always be the first row in the .csv.

label,color
Annabeth Chase,#df3920
Harry Potter,#ff8000
Hermione Granger,#4db34d
John Watson,#3399cc
Percy Jackson,#cc3399
Sherlock Holmes,#9933cc

Note: colored label sets only work for the .csv format.

Color-coded Labels

Datasaur supports HTML color codes. For your reference, below are the default colors provided by Datasaur for better viewing clarity in your project.

  • #df3920

  • #ff8000

  • #ffc826

  • #91b34d

  • #4db34d

  • #33cc99

  • #3399cc

  • #3370cc

  • #3333cc

  • #7033cc

  • #9933cc

  • #cc3399

Bounding Box Labeling

Label Sets

You can utilize .csv, .tsv, or .json formats for the bounding box label set.

  • For .csv/.tsv, we support color names (e.g., red), hex values (e.g., #00FF00), and RGB (e.g., rgb(0,0,255)). You can also utilize the label set with just names, as shown in the Datasaur sample - Bbox only name.csv below. Other values such as captionAllowed and captionRequired will use default settings.

  • For .json, we support hex and RGB only.

Text Transcription

The Text Transcription setting allows the labeler to add corresponding text to a bounding box. Disabling this setting means the labeler could not add the text.

Require caption

By turning on the Text Transcription setting, the labeler can add text to a bounding box. You can choose whether a specific label must have a text by disabling or enabling the Require caption checkbox.

Row-based/Document-based Labeling

For row-based or document-based projects, a label set is a .csv with questions in the first column and answers in subsequent columns:

You can also create a .json for a label set that has multiple question types.

Question Types

As mentioned before, label sets for row-based and document-based projects are sets of questions. Let's take a look at the question types available below.

1. Text Field

Text Field allows the labeler to answer questions by typing in free-form text, up to a single line at a time.

Users also can add some validation by expanding the Advanced Settings.

2. Text Area

Text Area allows the labeler to answer questions by typing in free-form text. In contrast to Text Fields, this allows for multiple-line answers.

3. Dropdown

Dropdown requires labelers to answer questions by picking one of several multiple-choice answers.

  • If you have a .csv with a pre-set list of answers, you can upload the .csv as an answer set.

  • You can also allow the labelers to select multiple answers by checking the box for Allow multiple choices.

4. Hierarchical Dropdown

Hierarchical dropdown allows the labeler to answer questions with hierarchically organized options.

  • Just like with the Dropdown type, you can also upload an answer set once you have created the hierarchical question. The format for hierarchical label sets can be found below.

5. Date

Date allows the labeler to answer the question in two ways. The key benefit of selecting Date is that this format validates that a correct date has been filled in.

  • Typing the date in manually.

  • Clicking on the calendar symbol, then selecting the date.

If you want to fill date questions with the current timestamp at the time the labeler opens the project, you can check the Use current date as default value box on Step 3.

6. Time

Time allows the labeler to answer the question in two ways. The key benefit of selecting Time is that this format validates that a correct time has been filled in.

  • Typing it manually.

  • Clicking on the clock symbol, then selecting the time.

If you want to fill time questions with the current timestamp at the time the labeler opens the project, you can check the Use current time as default value box on Step 3.

7. Slider

Slider allows the labeler to answer the question by moving the sliding bar (ex: from 1 to 10).

To avoid subjective measurement, you can also hide the value from labelers in Step 3. Please note that the value will be visible in the reviewer mode.

You have the flexibility to personalize the slider color according to your preferences. While the default color for “Start at” and “End to” is blue, we provide 11 alternative default color options for you to select from.

When it comes to colors, you have the choice of using hex codes, color names, or RGB values. If you opt for any of these choices, the dropdown will be labeled as “Custom”.

To get a glimpse of how the color will appear, simply drag the slider thumb on the Preview.

Please note that we only allow numbers as the slider value.

8. Grouped Attributes

Grouped Attributes allows the labeler to combine multiple questions that pertain to a single group.

9. Checkbox

Checkbox allows the labelers to answer the question by checking it. You can also put a description.

10. URL

URL allows you to put the URL links and apply validation on it.

11. Radio Button

Radio Button allows the labeler to answer questions by selecting one answer.

You can also insert a hint to give a description of the Radio Button. Here is an example of using the Radio Button in the labeling process:

Advanced Settings

In Row labeling projects, you can use the advanced setting “Refer answer to table column.”

Refer answer to table column

This feature is beneficial if you want to link answers to specific columns. A typical scenario for this is when you have a pre-labeled file and need to review the responses. Enabling this eliminates the need to apply the answers from scratch!

To enable this feature, navigate to Step 3 of the Project Creation Wizard and locate the Advanced Settings section. Here, you can choose the column headers for the questions you wish to bind.

Please note that this configuration can only be done during the project creation process.

After completing the project creation process, open the created project. You can now observe the binding result in the Document Labeling extension. The bound question is now filled with the answer from the bound column of the selected row.

Hierarchical Label Sets

It is possible to upload multi-level hierarchical label sets in .csv for token-based, row-based, and document-based projects. Here is a sample of a hierarchical label set:

id,label
1,Novel
1.1,Author
1.2,Title
2,Characters
2.1,Antagonist
2.2,Protagonist

💡Let's break down the components of this file

1. The header

id,label is the header. This will always be the first row in the .csv. The first label will have 1 as the id, as same as the example above.

2. id format

The id format is similar to Microsoft Word's numbering format. In the example above, Author is a part of Novel and the id will be 1.1.

  1. Novel: the root-level.

  2. 1: id for the root-level

  3. Author: the second-level.

  4. 1.1: the second-level id.

3. Hierarchical label sets in span-based projects

In span-based projects, the hierarchy will be visible in the Labels extension and in the label dropdown.

4. Hierarchical label sets in row-based or document-based projects

  • You have to choose hierarchical dropdown as the question type when creating projects using Project Custom Wizard.

  • Hierarchical label sets in these projects are uploaded as answers sets.

💡 Pro Tip

  • Clicking the Home icon will go directly to the top-level label

  • You can search leaf nodes globally

Last updated