Label Sets / Question Sets
Last updated
Last updated
For span-based labeling, a label set is a single-column .csv
or .tsv
following the structure below:
Label 1
Label 2
Label 3
etc...
We provide twelve colors you can configure manually from the Labels extension. You can also create a label set with your desired label colors in it. A sample file is provided below. Note: we do support any HTML color codes (as seen below).
Note: label,color
is the header. This will always be the first row in the .csv.
Note: colored label sets only work for the .csv
format.
Datasaur supports HTML color codes. For your reference, below are the default colors provided by Datasaur for better viewing clarity in your project.
#df3920
#ff8000
#ffc826
#91b34d
#4db34d
#33cc99
#3399cc
#3370cc
#3333cc
#7033cc
#9933cc
#cc3399
You can utilize .csv
, .tsv
, or .json
formats for the bounding box label set.
For .csv/.tsv
, we support color names (e.g., red), hex values (e.g., #00FF00), and RGB (e.g., rgb(0,0,255)). You can also utilize the label set with just names, as shown in the Datasaur sample - Bbox only name.csv below. Other values such as captionAllowed and captionRequired will use default settings.
For .json
, we support hex and RGB only.
The Text Transcription setting allows the labeler to add corresponding text to a bounding box. Disabling this setting means the labeler could not add the text.
By turning on the Text Transcription setting, the labeler can add text to a bounding box. You can choose whether a specific label must have a text by disabling or enabling the Require caption checkbox.
For row-based or document-based projects, a label set is a .csv
with questions in the first column and answers in subsequent columns:
Question 1
Answer 1
Answer 2
Answer 3
Question 2
Answer 1
Answer 2
Question 3
Answer 1
Answer 2
Answer 3
Answer 4
Answer 5
You can also create a .json
for a label set that has multiple question types.
You can optionally set hints for each question. You can include additional instructions or explanations in the questions’ hint, which can help labelers in submitting answers most relevant to the task. You can set a text of up to around 65,000 characters for a single question’s hint.
Question hints can be set during project creation or when configuring a question set on Label management page.
The Document and Row Labeling extensions are able to parse markdown syntaxes in question hints. This gives you flexibility in formatting the text; enabling you to present lists, attach links to external sites, or emphasize certain parts of the hint, all using a familiar set of syntaxes. The following are examples of the supported formatting.
Bold
your text
Italic
your text
Underline
<u>your text</u>
• Bullet
dashes (-) or asterisks (*)
1. Numbering
1., 2., 3., etc.
[your text] (https://example.com)
Keep in mind, the markdown symbols will count towards the character limit.
We recommend keeping hints brief and focused on the relevant information for the labelers’ task. Longer hints may appear as large text blocks, which can clutter the UI. For more complex information or media, consider including links that labelers can easily click on.
As mentioned before, label sets for row-based and document-based projects are sets of questions. Let's take a look at the question types available below.
Text Field allows the labeler to answer questions by typing in free-form text, up to a single line at a time.
Users also can add some validation by expanding the Advanced Settings.
Text Area allows the labeler to answer questions by typing in free-form text. In contrast to Text Fields, this allows for multiple-line answers.
Dropdown requires labelers to answer questions by picking one of several multiple-choice answers.
If you have a .csv with a pre-set list of answers, you can upload the .csv
as an answer set.
You can also allow the labelers to select multiple answers by checking the box for Allow multiple choices.
Hierarchical dropdown allows the labeler to answer questions with hierarchically organized options.
Just like with the Dropdown type, you can also upload an answer set once you have created the hierarchical question. The format for hierarchical label sets can be found below.
Previously known as the Yes/No, this question type has now been renamed to True/False.
True/False allows the labelers to answer the question by checking it. You can also put a description.
Previously known as the Radio Button, this question type has now been renamed to Single Choice.
Single Choice allows the labeler to answer questions by selecting one answer.
You can configure up to 25 answer options for this question type.
You can also insert a hint to give a description of the Single Choice. Here is an example of using the Single Choice in the labeling process:
Multiple Choice allows the labeler to submit multiple answers by selecting more than one option from a list, or they can choose just one option if necessary.
The options are displayed as a staggered grid of checkboxes, making it more suitable for a smaller and simpler set of options. You can configure up to 25 answer options for this question type.
Date allows the labeler to answer the question in two ways. The key benefit of selecting Date is that this format validates that a correct date has been filled in.
Typing the date in manually.
Clicking on the calendar symbol, then selecting the date.
If you want to fill date questions with the current timestamp at the time the labeler opens the project, you can check the Use current date as default value box on Step 3.
Time allows the labeler to answer the question in two ways. The key benefit of selecting Time is that this format validates that a correct time has been filled in.
Typing it manually.
Clicking on the clock symbol, then selecting the time.
If you want to fill time questions with the current timestamp at the time the labeler opens the project, you can check the Use current time as default value box on Step 3.
Slider allows the labeler to answer the question by moving the sliding bar (ex: from 1 to 10).
To avoid subjective measurement, you can also hide the value from labelers in Step 3. Please note that the value will be visible in the reviewer mode.
You have the flexibility to personalize the slider color according to your preferences. While the default color for “Start at” and “End to” is blue, we provide 11 alternative default color options for you to select from.
When it comes to colors, you have the choice of using hex codes, color names, or RGB values. If you opt for any of these choices, the dropdown will be labeled as “Custom”.
To get a glimpse of how the color will appear, simply drag the slider thumb on the Preview.
Please note that we only allow numbers as the slider value.
URL allows you to put the URL links and apply validation on it.
Grouped Attributes allows the labeler to combine multiple questions that pertain to a single group.
Only supported in Row Labeling project.
Script-Generated Questions is an advanced question type that dynamically generates different questions for each row based on its data. Unlike predefined question sets, this approach allows for flexible, on-the-fly question generation, making it ideal for scenarios where static question lists are insufficient. For more details, see this page here.
In Row labeling projects, you can use the advanced setting “Refer answer to table column.”
This feature is beneficial if you want to link answers to specific columns. A typical scenario for this is when you have a pre-labeled file and need to review the responses. Enabling this eliminates the need to apply the answers from scratch!
To enable this feature, navigate to Step 3 of the Project Creation Wizard and locate the Advanced Settings section. Here, you can choose the column headers for the questions you wish to bind.
Please note that this configuration can only be done during the project creation process.
After completing the project creation process, open the created project. You can now observe the binding result in the Document Labeling extension. The bound question is now filled with the answer from the bound column of the selected row.
This feature only available in Row Labeling project and disabled by default. Please reach out to support@datasaur.ai if your team needs this feature, and we'll assist you!
The Answer Validation Script is a highly flexible feature powered by TypeScript designed to help validate the logic of answering a row in Row labeling tasks. With this feature, you can write validation scripts to handle complex scenarios, such as verifying labeled data using other answers, comparing data across questions, or using external APIs for dynamic validation. Once the script is configured, if labelers or reviewers attempt to submit a row that fails validation, an error message will be displayed.
This functionality enables better control, accuracy, and consistency in the labeling process.
Row-Specific Validation: Validates data based on the content of the current row.
Cross-Question Validation: Checks answers by comparing them with answers from other questions.
API-based Validation: Incorporates validations that rely on external APIs or external business logic.
Note
The validation cannot be configured if the questions have not been set up yet.
Only accessible by Admins in Reviewer mode.
Go to the Row Labeling Extension inside the project.
Click on the three-dot menu.
Select "Configure answer validation script…".
When opening the Answer Validation Script dialog for the first time, you will be prompted with this template:
To decide whether to pass or fail the submission, you can return an object with or without errorMessage
as the property:
When validating, you will likely need to access certain information to determine whether the answer is valid or requires adjustment before submission. You can access all information provided from the function argument as demonstrated in the function to write the desired validation behavior.
columns: TableColumn[]
holds information about the column structure of the data being labeled.
row: Cell[]
is an array of cells containing data that is being labeled.
questions: Question[]
is an array of questions of the project.
answers
is an object containing answers with question’s id as the key. Depending on the question, it can be in 4 different types based on the question type:
normal question
string
string[]
grouped attributes
Answer
Answer[]
We also provided some helper functions in the template to help some most basic data access, such as:
function getCellValueByColumnLabel(label: string): string;
This function helps you obtain data based on the column’s label.
function getAnswerByQuestionLabel(label: string, searchQuestions: Question[] = questions): Answers;
This function helps you obtain the answer value based on the question’s label.
You can include API requests in your validation, enabling dynamic or third-party validations by using the Fetch API.
Disclaimer
We do not accept any responsibility for any API calls that are misrouted, improperly configured, or sent to unintended parties, which may lead to the exposure, leakage, or compromise of data confidentiality.
Users are fully responsible for ensuring the accuracy, security, and integrity of API configurations and transmissions. By using our services, you acknowledge and accept these responsibilities.
Can I validate across multiple rows?
No, the validation script is row-specific. It operates on individual rows being labeled.
What happens if there's an error in the script?
Unhandled exceptions or errors in the script will result in validation errors being shown and prevent the labeler from submitting their answers. You may choose to catch the error inside the script and let the submission continue if needed.
It is possible to upload multi-level hierarchical label sets in .csv
for token-based, row-based, and document-based projects. Here is a sample of a hierarchical label set:
id,label
is the header. This will always be the first row in the .csv
. The first label will have 1 as the id, as same as the example above.
The id format is similar to Microsoft Word's numbering format. In the example above, Author is a part of Novel and the id will be 1.1.
Novel: the root-level.
1: id for the root-level
Author: the second-level.
1.1: the second-level id.
In span-based projects, the hierarchy will be visible in the Labels extension and in the label dropdown.
You have to choose hierarchical dropdown as the question type when creating projects using Project Custom Wizard.
Hierarchical label sets in these projects are uploaded as answers sets.
💡 Pro Tip
Clicking the Home icon will go directly to the top-level label
You can search leaf nodes globally