# Creating a Project

After [signing in](https://app.datasaur.ai/sign-in) to Datasaur, you will be automatically directed to your **personal workspace.** To switch to your team workspace, select your avatar in the top right corner, choose **Switch workspace**, then select your team workspace.\
\
You will arrive on the **Projects** page of the team workspace. Admins can create projects here. You can also see the list of projects that you are working on.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-86c29dd3ce7c46b99fb83cd9a616b08010b76315%2FProjects%20-%20active%20projects.png?alt=media" alt=""><figcaption></figcaption></figure>

Create a project by clicking the **Create project** button. This allows you to create any type of project. You can also create a project by selecting one of the project template shortcuts. These templates contain preconfigured settings for specific use cases.

In this example, we are going to create a span labeling project. If you would like a tutorial on creating a project for [row labeling](https://www.youtube.com/watch?v=KDRHJ7JVuDk\&t=43s), [audio labeling](https://www.youtube.com/watch?v=9IrY-xcta7w), [OCR labeling](https://www.youtube.com/watch?v=YsObDLKsewY), [bounding box labeling](https://youtu.be/5j4gV3tu1Ps), or [document labeling](https://www.youtube.com/watch?v=eylB9EaxzbI) projects, watch their corresponding Youtube videos.

## Project Creation Wizard

Click the **Create project** button. The process has five basic steps:

1. Upload
2. Preview
3. Labeler's tasks
4. Assignment
5. Project settings

### Step 1: Upload your files ([Video Tutorial](https://www.youtube.com/watch?v=T_G7wWx7LRg))

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-ea13f3058d0fb7838f4a1bd3b6a188b8bb50f9ce%2FPCW%20-%20Step%201%20-%20initial.png?alt=media" alt=""><figcaption></figcaption></figure>

You can see a list of the file formats that are natively supported for each project type by expanding the **Supported file types** section. As an example, we will create a span labeling project by uploading several `.txt` files.

{% hint style="info" %}
When uploading multiple files, ensure they are all in the same file format.
{% endhint %}

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-1b54794eeef20a393fbe60f33db7151c3ddb1fce%2FPCW%20-%20Step%201%20-%20supported%20file%20types.png?alt=media" alt=""><figcaption></figcaption></figure>

The maximum file size allowed is **50 MB**. Files can be uploaded in three ways:

* Drag and drop,
* Browse files from your computer,
* Fetch files from external object storage.

{% hint style="info" %}
If you are interested in creating project via API, you can find the documentation [here](https://datasaurai.gitbook.io/datasaur/api/create-new-project).
{% endhint %}

#### Add Project Tags

You can add one or more project tags by selecting existing tags or creating a new one.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-ab126e069acd1234f99c3a4eca85f86b22b3cb6d%2FPCW%20-%20Step%201%20-%20Tags%20dropdown%20expanded.png?alt=media" alt=""><figcaption></figcaption></figure>

If you forget to add tags at this step, you can add them later from the **Projects** page. Follow the steps to [add project tags](https://docs.datasaur.ai/workspace-management/project-management#create-tags) for guidance.

### Step 2: Preview the uploaded file ([Video Tutorial](https://youtu.be/T_G7wWx7LRg?feature=shared))

In this step, you choose how to divide your data into lines and how to tokenize it.

#### **Line separator**

The line separator determines how your data are split into separate rows in the labeling interface. There are two native options available:

1. **New line** – Divides your data by paragraph. Each row in the labeling interface will contain one paragraph from your original text.
2. **Dot (.)** – Divides your data by sentence. Each row will contain one sentence from your text.

#### **Tokenizer**

The tokenizer determines how your text is broken down into tokens for labeling. There are two native options available:

1. **Wink tokenizer** – separates certain punctuation from words.
2. **Whitespace tokenizer** – splits text at each space, keeping punctuation attached to words.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-a6920c534fc6883bc29ba097a46fa25019d116b7%2FPCW%20-%20Step%202%20-%20Wink%20tokenizer.png?alt=media" alt=""><figcaption><p>Wink tokenizer</p></figcaption></figure>

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-e27522a1c80d3a321962b08e5767db8c7c6e5326%2FPCW%20-%20Step%202%20-%20White%20space.png?alt=media" alt=""><figcaption><p>Whitespace tokenizer</p></figcaption></figure>

### Step 3: Labeler's tasks ([Video Tutorial](https://www.youtube.com/watch?v=T_G7wWx7LRg))

In this step, you choose the labeling type you want to work on. A detailed explanation of each task type can be found on the [Labeling Task Types](https://docs.datasaur.ai/nlp-projects/nlp-task-types) page.

Since we have previously uploaded `.txt` files, the available task types are span labeling and document labeling. In this example, we will choose span labeling. This means we need to provide labels for use in the project.

There are three ways to create or upload labels:

1. **Create label set from scratch**\
   Select **Create your own** to manually add your labels. You can also select the color for each label.
2. **Upload label set from a file**\
   Drag and drop your `.csv` label set file or select **Browse files** to upload. The CSV format is as follows:
   1. Place your first label in cell **A1**.
   2. Add subsequent labels down column A (**A2, A3, A4, …**).
3. **Browse from library**\
   In your team workspace, the **Label management** page lets you create, edit, and delete label sets. You can reuse saved label sets instead of re-uploading or recreating them for each new project.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-cfa5780a403bd152beae6afa8d17aa5ec7523c20%2FPCW%20-%20Step%203%20-%20span%20labeling.png?alt=media" alt=""><figcaption></figcaption></figure>

**Configuring span labeling settings**

At the bottom of the page, you'll see a section called **Span Labeling** where you can configure several things.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-dbda614de9704e7c885aa3b0e6ca5d6700a7e9c9%2FProject%20settings%20-%20Task%20settings%20-%20span%20labeling.png?alt=media" alt=""><figcaption></figcaption></figure>

* **Limit selection to a span of 1 token** is useful when you want to enforce that every token in the document must be labeled.
* **Spans should have at most one label** does not allow you to add multiple labels to a single span.
* **Allow arrows to be drawn between labels** allows you to draw arrows from one label to another to annotate relationships between words. This is useful for showing that an adjective is related to a noun, or a pronoun is referring to a person.
* **Default text** **selection** allows you to select whether token or character selection. Some languages may require you to change the selection to character selection, i.e. Mandarin, Korean, or Thai.

### Step 4: Assignment ([Video Tutorial](https://www.youtube.com/watch?v=T_G7wWx7LRg))

In this step, you assign labelers and reviewers to the project.There are 3 roles available:

* **Labeler**
* **Labeler & Reviewer**
* **Reviewer**

Workspace admins have only two options: **Labeler & Reviewer**, and **Reviewer**. Admins always have access to the reviewer mode for any project.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-f6aa2f04cc049faad79a444a5355ff7a9ea00855%2FPCW%20-%20Step%204%20-%20labelers%20and%20reviewers%20selected.png?alt=media" alt=""><figcaption></figcaption></figure>

#### Conflict resolution

The **Peer review consensus** setting determines how many labelers must agree on a label for it to be automatically accepted. The slider allows you to set the threshold for automatic acceptance.

For highly sensitive projects where there is no room for error, you can require all assigned labelers to agree. For less sensitive projects where efficiency and cost are more important than accuracy, a majority vote may be enough. Any labels that don’t meet the threshold must be manually reviewed by reviewers or the project creator.

If you choose **No consensus**, all labelers’ labels are treated as conflicting.

#### Dynamic review assignment

Enabling this option automatically assign a reviewer when labelers **have conflicts** in a project. The detailed information can be found in the [Dynamic Review Capabilities](https://datasaurai.gitbook.io/datasaur/getting-started/creating-a-project/dynamic-review-capabilities) page.

### Step 5: Configuring project settings ([Video Tutorial](https://www.youtube.com/watch?v=T_G7wWx7LRg))

In this step, we chose some final, advanced admin settings for the project.

Keep in mind that most of these choices are intended for advanced requirements.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-adb7476956abce1b6ab00224f9def693b45a6c08%2FPCW%20-%20Step%205%20-%20span%20labeling.png?alt=media" alt=""><figcaption></figcaption></figure>

#### **Labeling settings**

* **Label set modification** – Allows labelers to add, edit, or remove labels in the project.
* **Text modification** – Allows labelers to edit the text of the dataset.
* **Mask Personally Identifiable Information (PII)** – Lets admins mask sensitive information with asterisks or random characters. You can also choose which types of information to mask, for example, addresses, social security numbers, company names, etc.
* **Allow marking unapplied label classes as N/A** – Lets labelers mark unused labels as not applicable (N/A).

#### **Reviewing settings**

* **Show labeler names in Review Mode** – By default, names are shown. Uncheck this option to hide names and reduce bias.
* **Show rejected labels in Review Mode** – Allows reviewers to see labels they have rejected.
* **Show labels from inactive label set in Review Mode** – If your project has multiple label sets, this shows labels from all label sets at once.
* **Show original sentences in Review Mode** – Shows the original sentences alongisde any edits made by labelers.
* **Set notification for labeler's project completion** – By default, reviewers are notified when all labelers mark their work as complete. Use this slider to set the number of completed labelers that will trigger the email notification.

Once these settings are configured, click the **Launch project** button to create the project.\
\
**Happy labeling!**
