Creating a Project
After signing in to Datasaur, you will be automatically directed to your personal workspace. To switch to your team workspace, select your avatar in the top right corner, choose Switch workspace, then select your team workspace. You will arrive on the Projects page of the team workspace. Admins can create projects here. You can also see the list of projects that you are working on.

Create a project by clicking the Create project button. This allows you to create any type of project. You can also create a project by selecting one of the project template shortcuts. These templates contain preconfigured settings for specific use cases.
In this example, we are going to create a span labeling project. If you would like a tutorial on creating a project for row labeling, audio labeling, OCR labeling, bounding box labeling, or document labeling projects, watch their corresponding Youtube videos.
Project Creation Wizard
Click the Create project button. The process has five basic steps:
Upload
Preview
Labeler's tasks
Assignment
Project settings
Step 1: Upload your files (Video Tutorial)

You can see a list of the file formats that are natively supported for each project type by expanding the Supported file types section. As an example, we will create a span labeling project by uploading several .txt files.
When uploading multiple files, ensure they are all in the same file format.

The maximum file size allowed is 50 MB. Files can be uploaded in three ways:
Drag and drop,
Browse files from your computer,
Fetch files from external object storage.
If you are interested in creating project via API, you can find the documentation here.
Add Project Tags
You can add one or more project tags by selecting existing tags or creating a new one.

If you forget to add tags at this step, you can add them later from the Projects page. Follow the steps to add project tags for guidance.
Step 2: Preview the uploaded file (Video Tutorial)
In this step, you choose how to divide your data into lines and how to tokenize it.
Line separator
The line separator determines how your data are split into separate rows in the labeling interface. There are two native options available:
New line – Divides your data by paragraph. Each row in the labeling interface will contain one paragraph from your original text.
Dot (.) – Divides your data by sentence. Each row will contain one sentence from your text.
Tokenizer
The tokenizer determines how your text is broken down into tokens for labeling. There are two native options available:
Wink tokenizer – separates certain punctuation from words.
Whitespace tokenizer – splits text at each space, keeping punctuation attached to words.


Step 3: Labeler's tasks (Video Tutorial)
In this step, you choose the labeling type you want to work on. A detailed explanation of each task type can be found on the Labeling Task Types page.
Since we have previously uploaded .txt files, the available task types are span labeling and document labeling. In this example, we will choose span labeling. This means we need to provide labels for use in the project.
There are three ways to create or upload labels:
Create label set from scratch Select Create your own to manually add your labels. You can also select the color for each label.
Upload label set from a file Drag and drop your
.csvlabel set file or select Browse files to upload. The CSV format is as follows:Place your first label in cell A1.
Add subsequent labels down column A (A2, A3, A4, …).
Browse from library In your team workspace, the Label management page lets you create, edit, and delete label sets. You can reuse saved label sets instead of re-uploading or recreating them for each new project.

Configuring span labeling settings
At the bottom of the page, you'll see a section called Span Labeling where you can configure several things.

Limit selection to a span of 1 token is useful when you want to enforce that every token in the document must be labeled.
Spans should have at most one label does not allow you to add multiple labels to a single span.
Allow arrows to be drawn between labels allows you to draw arrows from one label to another to annotate relationships between words. This is useful for showing that an adjective is related to a noun, or a pronoun is referring to a person.
Default text selection allows you to select whether token or character selection. Some languages may require you to change the selection to character selection, i.e. Mandarin, Korean, or Thai.
Step 4: Assignment (Video Tutorial)
In this step, you assign labelers and reviewers to the project.There are 3 roles available:
Labeler
Labeler & Reviewer
Reviewer
Workspace admins have only two options: Labeler & Reviewer, and Reviewer. Admins always have access to the reviewer mode for any project.

Conflict resolution
The Peer review consensus setting determines how many labelers must agree on a label for it to be automatically accepted. The slider allows you to set the threshold for automatic acceptance.
For highly sensitive projects where there is no room for error, you can require all assigned labelers to agree. For less sensitive projects where efficiency and cost are more important than accuracy, a majority vote may be enough. Any labels that don’t meet the threshold must be manually reviewed by reviewers or the project creator.
If you choose No consensus, all labelers’ labels are treated as conflicting.
Dynamic review assignment
Enabling this option automatically assign a reviewer when labelers have conflicts in a project. The detailed information can be found in the Dynamic Review Capabilities page.
Step 5: Configuring project settings (Video Tutorial)
In this step, we chose some final, advanced admin settings for the project.
Keep in mind that most of these choices are intended for advanced requirements.

Labeling settings
Label set modification – Allows labelers to add, edit, or remove labels in the project.
Text modification – Allows labelers to edit the text of the dataset.
Mask Personally Identifiable Information (PII) – Lets admins mask sensitive information with asterisks or random characters. You can also choose which types of information to mask, for example, addresses, social security numbers, company names, etc.
Allow marking unapplied label classes as N/A – Lets labelers mark unused labels as not applicable (N/A).
Reviewing settings
Show labeler names in Review Mode – By default, names are shown. Uncheck this option to hide names and reduce bias.
Show rejected labels in Review Mode – Allows reviewers to see labels they have rejected.
Show labels from inactive label set in Review Mode – If your project has multiple label sets, this shows labels from all label sets at once.
Show original sentences in Review Mode – Shows the original sentences alongisde any edits made by labelers.
Set notification for labeler's project completion – By default, reviewers are notified when all labelers mark their work as complete. Use this slider to set the number of completed labelers that will trigger the email notification.
Once these settings are configured, click the Launch project button to create the project. Happy labeling!
Last updated