# Create a Project

There are two ways to create a project:

* Create it from scratch by clicking the **Create project** button.
* Choose a project template shortcut with preconfigured settings for specific use cases.

<figure><img src="/files/gYW4yM2q8qOeysCEEBD5" alt=""><figcaption></figcaption></figure>

In this example, we are going to create a span labeling project. If you would like a tutorial on creating a project for [row labeling](https://www.youtube.com/watch?v=KDRHJ7JVuDk\&t=43s), [audio labeling](https://www.youtube.com/watch?v=9IrY-xcta7w), [OCR labeling](https://www.youtube.com/watch?v=YsObDLKsewY), [bounding box labeling](https://youtu.be/5j4gV3tu1Ps), or [document labeling](https://www.youtube.com/watch?v=eylB9EaxzbI) projects, watch their corresponding Youtube videos.

There are five steps to create a project:

1. Upload
2. Preview
3. Labeler's tasks
4. Assignment
5. Project settings

## Step 1: Upload your files

<figure><img src="/files/Ua46AdtlhYXzqkwRI7wP" alt=""><figcaption></figcaption></figure>

You can see a list of the file formats that are natively supported for each project type by expanding the **Supported file types** section. As an example, we will create a span labeling project by uploading several `.txt` files.

{% hint style="info" %}
When uploading multiple files, ensure they are all in the same file format.
{% endhint %}

<figure><img src="/files/i3hanTFm59OcmKOhSoW6" alt=""><figcaption></figcaption></figure>

The maximum file size allowed is **50 MB**. Files can be uploaded in three ways:

* Drag and drop,
* Browse files from your computer,
* Fetch files from external object storage.

{% hint style="info" %}
If you are interested in creating project via API, you can find the documentation [here](https://datasaurai.gitbook.io/datasaur/api/create-new-project).
{% endhint %}

### Add project tags

You can add one or more project tags by selecting existing tags or creating a new one.

<figure><img src="/files/TrXanvc8I7dqekwVUkjh" alt=""><figcaption></figcaption></figure>

If you forget to add tags at this step, you can add them later from the **Projects** page. Follow the steps to [add project tags](https://docs.datasaur.ai/workspace-management/project-management#create-tags) for guidance.

## Step 2: Preview files

In this step, you choose how to divide your data into lines and how to tokenize it.

### Line separator

The line separator determines how your data are split into separate rows in the labeling interface. There are two native options available:

1. **New line** – Divides your data by paragraph. Each row in the labeling interface will contain one paragraph from your original text.
2. **Dot (.)** – Divides your data by sentence. Each row will contain one sentence from your text.

### Tokenizer

The tokenizer determines how your text is broken down into tokens for labeling. There are two native options available:

1. **Wink tokenizer** – separates certain punctuation from words.
2. **Whitespace tokenizer** – splits text at each space, keeping punctuation attached to words.

<figure><img src="/files/2MsJoleNK9AWpjcyfNAA" alt=""><figcaption><p>Wink tokenizer</p></figcaption></figure>

<figure><img src="/files/4y8bVGFwQHsudq91BqUB" alt=""><figcaption><p>Whitespace tokenizer</p></figcaption></figure>

## Step 3: Labeler's tasks

In this step, you choose the labeling type you want to work on. A detailed explanation of each task type can be found on the [Labeling Task Types](https://docs.datasaur.ai/nlp-projects/nlp-task-types) page.

Since we have previously uploaded `.txt` files, the available task types are span labeling and document labeling. In this example, we will choose span labeling. This means we need to provide labels for use in the project.

There are three ways to create or upload labels:

1. **Create label set from scratch**\
   Select **Create your own** to manually add your labels. You can also select the color for each label.
2. **Upload label set from a file**\
   Drag and drop your `.csv` label set file or select **Browse files** to upload. The CSV format is as follows:
   1. Place your first label in cell **A1**.
   2. Add subsequent labels down column A (**A2, A3, A4, …**).
3. **Browse from library**\
   In your team workspace, the **Label management** page lets you create, edit, and delete label sets. You can reuse saved label sets instead of re-uploading or recreating them for each new project.

<figure><img src="/files/QvxFvnhmLhT48P961DWp" alt=""><figcaption></figcaption></figure>

### Span labeling settings

At the bottom of the page, you'll see a section called **Span labeling** where you can configure several things.

<figure><img src="/files/CwBXNbcDZKnzRARXI9tu" alt=""><figcaption></figcaption></figure>

* **Limit selection to a span of 1 token** is useful when you want to enforce that every token in the document must be labeled.
* **Spans should have at most one label** does not allow you to add multiple labels to a single span.
* **Allow arrows to be drawn between labels** allows you to draw arrows from one label to another to annotate relationships between words. This is useful for showing that an adjective is related to a noun, or a pronoun is referring to a person.
* **Default text** **selection** allows you to select whether token or character selection. Some languages may require you to change the selection to character selection, i.e. Mandarin, Korean, or Thai.

## Step 4: Assignment

In this step, you assign labelers and reviewers to the project.There are 3 roles available:

* **Labeler**
* **Labeler & Reviewer**
* **Reviewer**

Workspace admins have only two options: **Labeler & Reviewer**, and **Reviewer**. Admins always have access to the reviewer mode for any project.

<figure><img src="/files/doirPkpjRW8JKNYNiwhs" alt=""><figcaption></figcaption></figure>

### Conflict resolution

The **Peer review consensus** setting determines how many labelers must agree on a label for it to be automatically accepted. The slider allows you to set the threshold for automatic acceptance.

For highly sensitive projects where there is no room for error, you can require all assigned labelers to agree. For less sensitive projects where efficiency and cost are more important than accuracy, a majority vote may be enough. Any labels that don’t meet the threshold must be manually reviewed by reviewers or the project creator.

If you choose **No consensus**, all labelers’ labels are treated as conflicting.

### Dynamic review assignment

Enabling this option automatically assign a reviewer when labelers **have conflicts** in a project. The detailed information can be found in the [Dynamic Review Capabilities](https://datasaurai.gitbook.io/datasaur/getting-started/creating-a-project/dynamic-review-capabilities) page.

## Step 5: Project settings

In this step, we chose some final, advanced admin settings for the project.

Keep in mind that most of these choices are intended for advanced requirements.

<figure><img src="/files/FunpJLxBJlEed0rpT824" alt=""><figcaption></figcaption></figure>

### Labeling settings

* **Label set modification** – Allows labelers to add, edit, or remove labels in the project.
* **Text modification** – Allows labelers to edit the text of the dataset. Learn more in [Text modification](#text-modification) section.
* **Mask Personally Identifiable Information (PII)** – Lets admins mask sensitive information with asterisks or random characters. Learn more in the [Mask Personally Identifiable Information (PII)](#mask-personally-identifiable-information-pii) section.
* **Allow marking unapplied label classes as N/A** – Lets labelers mark unused labels as not applicable (N/A).
* **Rapid labeling feedback** — Labelers receive ongoing feedback on the status of their submitted labels (e.g., accepted or rejected) while the project is in progress.
* **Require approval for pre-labeled labels** — Pre-labeled data must be reviewed and explicitly approved by labelers before it is counted as their work. This setting cannot be changed after the project is created.

#### Text modification

**Text modification** prevents labelers and reviewers from editing the document text. Refer to the table below for detailed permission.

<figure><img src="/files/7t87WUyvd1OiIoqNzus8" alt=""><figcaption></figcaption></figure>

#### Mask Personally Identifiable Information (PII)

After enabling the setting, click **Data masking settings** to define the masking method and the attributes to mask.

<figure><img src="/files/8LDpRHy9Or3wI0zDTt7o" alt=""><figcaption></figcaption></figure>

The **masking method** determines how the selected information attributes are anonymized. The following masking methods are available:

* **Random character**: Replace the personal information with random characters.
  * Example: `May 23rd, 2022` → `Pgh 78ax, 9928`
* **Asterisk (alphanumeric)**: Replace alphanumeric characters in the personal information with asterisks while keeping other characters (such as spaces or symbols).
  * Example: `John Doe` → `**** ***`
* **Asterisk (all characters)**: Replace all characters in the personal information with asterisks, including letters, numbers, spaces, and symbols.
  * Example: `John@gmail.com` → `**************`
* **Field name** *(Row Labeling projects only)*: Replace the personal information with its field or category name. This allows context-aware masking without revealing the original text length.
  * Example: `John Doe` → `[Name]`

To select the information attributes to mask, check or uncheck the **PII fields** in the table. You can also define a **regular expression rule** to specify which entities should be masked.

### Reviewing settings

* **Show labeler names in Review Mode** – By default, names are shown. Uncheck this option to hide names and reduce bias.
* **Allow reviewers to apply new labels in Review Mode** — When enabled, reviewers can create and apply new labels during review. When disabled, reviewers can only accept, reject, or replace labels created by labelers.
  * **Note:** Labels from ML-assisted labeling or Data Programming can still be applied.
* **Show rejected labels in Review Mode** – Allows reviewers to see labels they have rejected.
* **Show labels from inactive label set in Review Mode** – If your project has multiple label sets, this shows labels from all label sets at once.
* **Show original sentences in Review Mode** – Shows the original sentences alongisde any edits made by labelers.
* **Set notification for labeler's project completion** – By default, reviewers are notified when all labelers mark their work as complete. Use this slider to set the number of completed labelers that will trigger the email notification.

Once these settings are configured, click the **Launch project** button to create the project.\
\
**Happy labeling!**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/data-studio-projects/creating-a-project.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
