Creating a Project

After signing inarrow-up-right to Datasaur, you will be automatically directed to your personal workspace. To switch to your team workspace, select your avatar in the top right corner, choose Switch workspace, then select your team workspace. You will arrive on the Projects page of the team workspace. Admins can create projects here. You can also see the list of projects that you are working on.

Create a project by clicking the Create project button. This allows you to create any type of project. You can also create a project by selecting one of the project template shortcuts. These templates contain preconfigured settings for specific use cases.

In this example, we are going to create a span labeling project. If you would like a tutorial on creating a project for row labelingarrow-up-right, audio labelingarrow-up-right, OCR labelingarrow-up-right, bounding box labelingarrow-up-right, or document labelingarrow-up-right projects, watch their corresponding Youtube videos.

Project Creation Wizard

Click the Create project button. The process has five basic steps:

  1. Upload

  2. Preview

  3. Labeler's tasks

  4. Assignment

  5. Project settings

Step 1: Upload your files (Video Tutorialarrow-up-right)

You can see a list of the file formats that are natively supported for each project type by expanding the Supported file types section. As an example, we will create a span labeling project by uploading several .txt files.

circle-info

When uploading multiple files, ensure they are all in the same file format.

The maximum file size allowed is 50 MB. Files can be uploaded in three ways:

  • Drag and drop,

  • Browse files from your computer,

  • Fetch files from external object storage.

circle-info

If you are interested in creating project via API, you can find the documentation herearrow-up-right.

Add Project Tags

You can add one or more project tags by selecting existing tags or creating a new one.

If you forget to add tags at this step, you can add them later from the Projects page. Follow the steps to add project tagsarrow-up-right for guidance.

Step 2: Preview the uploaded file (Video Tutorialarrow-up-right)

In this step, you choose how to divide your data into lines and how to tokenize it.

Line separator

The line separator determines how your data are split into separate rows in the labeling interface. There are two native options available:

  1. New line – Divides your data by paragraph. Each row in the labeling interface will contain one paragraph from your original text.

  2. Dot (.) – Divides your data by sentence. Each row will contain one sentence from your text.

Tokenizer

The tokenizer determines how your text is broken down into tokens for labeling. There are two native options available:

  1. Wink tokenizer – separates certain punctuation from words.

  2. Whitespace tokenizer – splits text at each space, keeping punctuation attached to words.

Wink tokenizer
Whitespace tokenizer

Step 3: Labeler's tasks (Video Tutorialarrow-up-right)

In this step, you choose the labeling type you want to work on. A detailed explanation of each task type can be found on the Labeling Task Typesarrow-up-right page.

Since we have previously uploaded .txt files, the available task types are span labeling and document labeling. In this example, we will choose span labeling. This means we need to provide labels for use in the project.

There are three ways to create or upload labels:

  1. Create label set from scratch Select Create your own to manually add your labels. You can also select the color for each label.

  2. Upload label set from a file Drag and drop your .csv label set file or select Browse files to upload. The CSV format is as follows:

    1. Place your first label in cell A1.

    2. Add subsequent labels down column A (A2, A3, A4, …).

  3. Browse from library In your team workspace, the Label management page lets you create, edit, and delete label sets. You can reuse saved label sets instead of re-uploading or recreating them for each new project.

Configuring span labeling settings

At the bottom of the page, you'll see a section called Span Labeling where you can configure several things.

  • Limit selection to a span of 1 token is useful when you want to enforce that every token in the document must be labeled.

  • Spans should have at most one label does not allow you to add multiple labels to a single span.

  • Allow arrows to be drawn between labels allows you to draw arrows from one label to another to annotate relationships between words. This is useful for showing that an adjective is related to a noun, or a pronoun is referring to a person.

  • Default text selection allows you to select whether token or character selection. Some languages may require you to change the selection to character selection, i.e. Mandarin, Korean, or Thai.

Step 4: Assignment (Video Tutorialarrow-up-right)

In this step, you assign labelers and reviewers to the project.There are 3 roles available:

  • Labeler

  • Labeler & Reviewer

  • Reviewer

Workspace admins have only two options: Labeler & Reviewer, and Reviewer. Admins always have access to the reviewer mode for any project.

Conflict resolution

The Peer review consensus setting determines how many labelers must agree on a label for it to be automatically accepted. The slider allows you to set the threshold for automatic acceptance.

For highly sensitive projects where there is no room for error, you can require all assigned labelers to agree. For less sensitive projects where efficiency and cost are more important than accuracy, a majority vote may be enough. Any labels that don’t meet the threshold must be manually reviewed by reviewers or the project creator.

If you choose No consensus, all labelers’ labels are treated as conflicting.

Dynamic review assignment

Enabling this option automatically assign a reviewer when labelers have conflicts in a project. The detailed information can be found in the Dynamic Review Capabilitiesarrow-up-right page.

Step 5: Configuring project settings (Video Tutorialarrow-up-right)

In this step, we chose some final, advanced admin settings for the project.

Keep in mind that most of these choices are intended for advanced requirements.

Labeling settings

  • Label set modification – Allows labelers to add, edit, or remove labels in the project.

  • Text modification – Allows labelers to edit the text of the dataset.

  • Mask Personally Identifiable Information (PII) – Lets admins mask sensitive information with asterisks or random characters. You can also choose which types of information to mask, for example, addresses, social security numbers, company names, etc.

  • Allow marking unapplied label classes as N/A – Lets labelers mark unused labels as not applicable (N/A).

Reviewing settings

  • Show labeler names in Review Mode – By default, names are shown. Uncheck this option to hide names and reduce bias.

  • Show rejected labels in Review Mode – Allows reviewers to see labels they have rejected.

  • Show labels from inactive label set in Review Mode – If your project has multiple label sets, this shows labels from all label sets at once.

  • Show original sentences in Review Mode – Shows the original sentences alongisde any edits made by labelers.

  • Set notification for labeler's project completion – By default, reviewers are notified when all labelers mark their work as complete. Use this slider to set the number of completed labelers that will trigger the email notification.

Once these settings are configured, click the Launch project button to create the project. Happy labeling!

Last updated