> For the complete documentation index, see [llms.txt](https://docs.datasaur.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.datasaur.ai/advanced/extensions/search.md).

# Search

The **Search** extension helps you quickly find specific words, phrases, or labeled spans in your data. It supports searching within a single document or across an entire project, including label-based search, regex search, and exact word matching. Results are shown in a clear list so you can review and work with large datasets more easily.

## Span labeling

In a span labeling project, two types of searches are available: **Basic** and **Advanced.**

<figure><img src="/files/2EYYpEmBcBYrDMEqMKiy" alt=""><figcaption></figcaption></figure>

### Basic search

The **Standard search** allows you to perform simple searches based on text and labels using keywords or regular expressions (regex). It provides quick access to relevant data by matching the input with the text or labels in the project.

<figure><img src="/files/DXvuq4VSUkAUyW7Knjfz" alt=""><figcaption></figcaption></figure>

#### Search by text

Search for specific words or patterns in the project by entering keywords and selecting a word filter.

**Word filter**

Define how their search keywords are matched to results. The available options are:

* **Contains any word:** Matches results that contain any of the specified words.
  * Example: Searching for <mark style="color:red;">`men`</mark> will match with <mark style="color:red;">`men`</mark>, <mark style="color:red;">`mentioned`</mark>, <mark style="color:red;">`abandonment`</mark>.
* **Exact word:** Displays only exact matches for the search keyword.
  * Example: Searching for <mark style="color:red;">`men`</mark> will match <mark style="color:red;">`men`</mark> but not <mark style="color:red;">`mentioned`</mark>.
* **Regex:** Search using regular expressions for advanced pattern matching.
  * Example: Searching for <mark style="color:red;">`men*`</mark> will match words starting with <mark style="color:red;">`men`</mark>, such as <mark style="color:red;">`mentioned`</mark>.

#### Search by label

Search for specific labels or categories in the data. This search mode always uses the **Contains any word** filter.

#### Search results

The search checks each span or label individually based on the search criteria you enter.

In reviewer mode, search results include reviewed labels (accepted and consensus), conflicted labels, and rejected labels.

<figure><img src="/files/Nxp3Sro4Lxbf47SmFXQ4" alt=""><figcaption></figcaption></figure>

To make search results easier to review, enable **Show only matching lines in the text viewer** from the three-dot menu in the top-right corner of the extension. This option hides lines that do not contain matching spans or labels, allowing you to focus only on relevant content.

### Advanced search

**Advanced search** allows you to create more detailed searches by combining multiple conditions. It supports complex queries using MongoDB query syntax.

<figure><img src="/files/i2kee4z7fuNsS2loR46x" alt=""><figcaption></figcaption></figure>

#### Configure conditions

There are two ways to configure the search conditions:

1. **Logic builder:** A visual interface for creating search conditions without writing queries manually.
2. **Query:** Directly input advanced queries for complex conditions.

#### Configure conditions via logic builder

<figure><img src="/files/13x4ZfdnqVpqD8sHi9rC" alt=""><figcaption></figcaption></figure>

You can create searches with multiple conditions, where each condition includes a **search target**, a **filter operation**, and a **keyword**. These conditions can be combined using **logical operators** such as `OR` or `AND` to define the relationship between the conditions.

**Search target:**

* <mark style="color:red;">`Text`</mark>: Matches words or content in the spans.
* <mark style="color:red;">`Label`</mark>: Matches the labels applied to the text.
* <mark style="color:red;">`Metadata`</mark>: Matches information attached to the line (in key-value pair).

**Filter operation:**

* <mark style="color:red;">`is`</mark>: Matches search target that exactly matches the specified keyword.
* <mark style="color:red;">`is not`</mark>: Matches search target that explicitly does not match the specified keyword.
* <mark style="color:red;">`contains`</mark>: Matches search target that contains the specified keyword.
* <mark style="color:red;">`does not contain`</mark>: Matches search target that does not contain the specified keyword
* <mark style="color:red;">`matches regex`</mark>: Matches search target that fits the regular expression pattern.

**Keyword:**

* For <mark style="color:red;">`Text`</mark> and <mark style="color:$danger;">`Label`</mark>, this is the word or phrase to match.
* For <mark style="color:red;">`Metadata`</mark>, this is the **key: value** pair used to filter information.

**Logical operator:**

* <mark style="color:red;">`OR`</mark>: Matches results that meet at least one condition.
* <mark style="color:red;">`AND`</mark>: Matches results that meet all conditions.

**Sub-condition:** An additional filter available when the <mark style="color:red;">`Label`</mark> search target is selected. It allows narrowing results based on who applied or reviewed the label.

{% hint style="info" %}
Sub-conditions are available **only in reviewer mode**.
{% endhint %}

* When using sub-conditions, multiple filters using logical operators such as <mark style="color:red;">`AND`</mark> and <mark style="color:$danger;">`OR`</mark>, just like parent conditions.
* Sub-condition filter operations:
  * <mark style="color:red;">`is applied by`</mark>: Matches labels that were applied by the selected labeler.
  * <mark style="color:red;">`is not applied by`</mark>: Matches labels that were not applied by the selected labeler.
  * <mark style="color:red;">`is reviewed by`</mark>: Matches labels that were reviewed by the selected reviewer (both accepted and rejected).
  * <mark style="color:red;">`is not reviewed by`</mark>: Matches labels that were not reviewed by the selected reviewer (both accepted and rejected).

#### Configure conditions via query

<figure><img src="/files/YJ15Q1tCWmrJU5fJDDfC" alt=""><figcaption></figcaption></figure>

You can set up conditions using MongoDB queries. Datasaur supports a subset of the MongoDB [query selectors](https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors), which are listed below.

**Key Operators**

* <mark style="color:red;">`$regex`</mark> — Search for text patterns (combine with <mark style="color:red;">`$options`</mark> for behavior like case-insensitivity using <mark style="color:red;">`$options: "i"`</mark>).
* <mark style="color:red;">`$not`</mark> — Exclude matches.
* <mark style="color:red;">`$or`</mark> — Requires at least one condition to match.
* <mark style="color:red;">`$and`</mark> — Requires all conditions to match.

{% hint style="info" %} <mark style="color:red;">`$or`</mark> and <mark style="color:red;">`$and`</mark> require exactly 2 conditions.
{% endhint %}

**Search conditions**

1. **Text condition** — Searches for text content.
   * **First example:** Find text containing <mark style="color:red;">`is`</mark>. This will match text like <mark style="color:red;">`This is used to train data.`</mark>

     ```mongodb
     {
       "cellFragment.content": {
         "$regex": "is",
         "$options": "i"
       }
     }
     ```
   * **Second example:** Find text not containing <mark style="color:red;">`is`</mark>. This will match text like <mark style="color:red;">`Labeling the data will be done in Datasaur.`</mark>

     ```mongodb
     {
       "cellFragment.content": {
         "$not": {
           "$regex": "is",
           "$options": "i"
         }
       }
     }
     ```
2. **Label condition** — Filters based on labeled spans.

   * **First example:** Find spans labeled with a label containing <mark style="color:red;">`GEO`</mark>. This will match labels like <mark style="color:red;">`GEO`</mark>, <mark style="color:red;">`Location Geo`</mark>, and <mark style="color:red;">`Geospatial Data`</mark>.

     ```mongodb
     {
       "spanLabels": {
         "$elemMatch": {
           "labelClassName": {
             "$regex": "GEO",
             "$options": "i"
           }
         }
       }
     }
     ```
   * **Second example:** Find spans labeled exactly with <mark style="color:red;">`GEO`</mark>. This will match labels like <mark style="color:red;">`GEO`</mark>, <mark style="color:red;">`geo`</mark>, or any other case variations, but the entire label must be <mark style="color:red;">`GEO`</mark> with no additional characters.

     ```mongodb
     {
       "spanLabels": {
         "$elemMatch": {
           "labelClassName": {
             "$regex": "^GEO$",
             "$options": "i"
           }
         }
       }
     }
     ```
   * **Third example:** Find spans labeled as <mark style="color:red;">`GEO`</mark> by the labeler with user ID <mark style="color:red;">`1`</mark> (Number 1 represents the user ID of the label contributors. These IDs are unique identifiers used to construct the query conditions). See the guide below for how to obtain the user ID.

     ```mongodb
     {
       "spanLabels": {
         "$elemMatch": {
           "labelClassName": {
             "$regex": "^GEO$",
             "$options": "i"
           },
           "appliedByUserIds": {
             "$in": [
               "1"
             ]
           }
         }
       }
     }
     ```
   * **Fourth example:** Find spans labeled as <mark style="color:red;">`GEO`</mark> that have been reviewed by reviewer with user ID <mark style="color:red;">`2`</mark> (Number 2 represents the user ID of the label contributors. These IDs are unique identifiers used to construct the query conditions). See the guide below for how to obtain the user ID.

     <pre class="language-mongodb"><code class="lang-mongodb">{
       "spanLabels": {
         "$elemMatch": {
           "labelClassName": {
             "$regex": "^GEO$",
             "$options": "i"
     <strong>      },
     </strong>      "reviewedByUserIds": {
             "$in": [
               "2"
             ]
           }
         }
       }
     }
     </code></pre>

   💡 Guide to determine a contributor's user ID, use the **Logic builder**:

   * Navigate to the **Configure conditions via logic builder** section.
   * Add a sub-condition that filters by **Labeler** or **Reviewer**.
   * Select the desired user from the dropdown list.
   * Save the conditions. The corresponding user ID will be automatically populated in the generated query.
3. **Metadata condition** — Searches for key-value pairs in the metadata attached to each line.
   * **Example:** Find metadata where the key is <mark style="color:red;">`category`</mark> and the value is <mark style="color:red;">`education`</mark>.

     ```mongodb
     {
       "cellFragment.metadata": {
         "$elemMatch": {
           "key": {
             "$regex": "category",
             "$options": "i"
           },
           "value": {
             "$regex": "education",
             "$options": "i"
           }
         }
       }
     }
     ```
4. **Logical OR condition** — Matches if any of the conditions are true.
   * **Example:** Find text containing either <mark style="color:red;">`France`</mark> or <mark style="color:red;">`John`</mark>.

     ```mongodb
     {
       "$or": [
         { 
     	    "cellFragment.content": { 
     		    "$regex": "France", 
     		    "$options": "i" 
     	    }
         },
         { 
     	    "cellFragment.content": { 
     		    "$regex": "John", "$options": "i" 
     	    }
         }
       ]
     }
     ```
5. **Logical AND condition** — Matches only if all conditions are true.
   * **Example:** Find text containing both <mark style="color:red;">`France`</mark> and <mark style="color:red;">`John`</mark>.

     ```mongodb
     {
       "$and": [
         { 
     	    "cellFragment.content": { 
     		    "$regex": "France", 
     		    "$options": "i" 
     	    }
         },
         { 
     	    "cellFragment.content": { 
     		    "$regex": "John", "$options": "i" 
     	    }
         }
       ]
     }
     ```

#### Search results

The search operates at the **line level**, meaning it evaluates each line individually against the list of specified conditions.

In the reviewer mode, search results include reviewed labels (accepted and consensus), conflicted labels, and rejected labels.

<figure><img src="/files/vzYlugU532skxzN3OQS9" alt=""><figcaption></figcaption></figure>

For conditions with negative operators (<mark style="color:red;">`is not`</mark>, <mark style="color:red;">`does not contain`</mark>), only the lines that meet the specified conditions will be displayed in the results.

<figure><img src="/files/xo11TXUcMa9C9wcqwkFc" alt=""><figcaption></figcaption></figure>

To make search results easier to review, enable **Show only matching lines in the text viewer** from the three-dot menu in the top-right corner of the extension. This option hides non-matching lines, allowing you to focus only on relevant content.

### Filter search results

After running a search, you can narrow down the results by applying filters. This is especially useful when working with large result sets.

<figure><img src="/files/VknYZgJfjlqpDHiHRT8D" alt=""><figcaption></figcaption></figure>

You can currently filter search results using the following criteria:

* **Label status** — Use this filter to show results based on their review status. Options include:
  * Accepted: The label is accepted by the reviewer or manually applied by the reviewer.
  * Conflicted: The label has unresolved disagreements.
  * Rejected: The label is rejected by the reviewer.

{% hint style="info" %}

* Status filters are available only in reviewer mode.
* The **Rejected** option appears only when the project setting **Show rejected labels in Review Mode** is enabled.
  {% endhint %}

- **Label class** — Use this filter to show results based on the label applied. Options include:
  * Unlabeled: The text does not have any label assigned.
  * Any label class that appears in the current search results.

### Saved search

Saved search enables storing custom search configurations for reuse. Once saved, the same search can be applied without the need to manually reconfigure the conditions.

#### Saving a search

{% hint style="info" %}
Saving a search can be done in both labeler mode and reviewer mode. A search saved in one mode will also be accessible in the other.
{% endhint %}

Once you have a search configured, open the **Search** extension menu. You should see an option called **Save search configuration**.

<figure><img src="/files/fpcpWMxhHJPhx994W4al" alt=""><figcaption></figcaption></figure>

Clicking this option will open a dialog where you can enter a name and description for the saved search, including a preview of the current search configuration.

<figure><img src="/files/FXqMcrEo52t1EIDOOLjf" alt=""><figcaption></figcaption></figure>

#### Using a saved search <a href="#using-a-saved-search" id="using-a-saved-search"></a>

There are two ways to access the saved search:

* Open the extension menu and select **Manage saved searches**.
* If advanced search type is selected, open the **Configure** menu and select **Use existing saved search**.

<figure><img src="/files/mQhyODkO41IBD68Wi3qi" alt=""><figcaption></figcaption></figure>

Both methods open a dialog displaying all saved searches. Selecting one shows its details on the right, and searches can also be filtered by name using the search bar. Click **Use saved search** to apply the selected configuration to the current session.

<figure><img src="/files/cm2n0oqBS6BIElbzlKKx" alt=""><figcaption></figcaption></figure>

#### Edit or delete a saved search <a href="#editing-or-deleting-a-saved-search" id="editing-or-deleting-a-saved-search"></a>

In the **Manage saved searches** dialog, select the saved search you want to edit or delete.

**Edit**

{% hint style="info" %}
While editing a saved search, you cannot navigate to or preview another saved search.
{% endhint %}

To edit a saved search, click the edit icon. Update the name and/or description, then click **Save changes**.

<figure><img src="/files/hj2FjdAIC84BeoZXVo2v" alt=""><figcaption></figcaption></figure>

**Delete**

To delete a saved search, click the delete icon. Confirm the action to permanently delete the saved search.

<figure><img src="/files/Xdtr4phlkLGCqEKORaHr" alt=""><figcaption></figcaption></figure>

### Bulk labeling for span labels

The **Label all** button lets you quickly apply a label to all matching results in a project.

<figure><img src="/files/JdKA4WdjngMwW9o47tXR" alt=""><figcaption></figcaption></figure>

For example, searching for the text <mark style="color:red;">`Holmes`</mark> will show all its instances in the document. Selecting <mark style="color:red;">`Person`</mark> from the dropdown and clicking **Label all** applies the <mark style="color:red;">`Person`</mark> label to every occurrence of <mark style="color:red;">`Holmes`</mark>.

This feature helps speed up bulk labeling and improves consistency, especially in projects requiring detailed text analysis.

{% hint style="info" %}
**Tips & tricks:** To make it easier to navigate through the results, you can use the **Up** or **Down arrow** keys to navigate between results easily.
{% endhint %}

### Delete search result labels

The **Delete search result labels** button removes labels from all matching labeled results in a project.

<figure><img src="/files/GX1kHpJQzK2rDBf48p9n" alt=""><figcaption></figcaption></figure>

For example, searching for <mark style="color:red;">`Shakespeare`</mark> will display all instances in the document along with any labels applied to the text. Clicking **Delete search result labels** will remove all labels associated with <mark style="color:red;">`Shakespeare`</mark>.

This feature is useful for bulk deletions, making the cleanup process faster and more efficient. It’s especially helpful for projects that involve large datasets, ensuring data accuracy and consistency while saving time.

### Bulk answer line questions

{% hint style="info" %}

* Only available when line labeling is enabled in a span labeling project.
* Supported in advanced search only.
  {% endhint %}

The **Answer matching lines in line labeling extension** option lets you apply the same answer to all matching lines from the search results in a project.

<figure><img src="/files/RnM4DmDBGiDMCLIr8xz2" alt=""><figcaption></figcaption></figure>

Clicking it will automatically select all matching lines in the editor based on the search results and direct you to the **Line labeling** extension. You can then modify the answer, and when you click **Submit answers**, it will be applied to all selected lines.

If an answer for a question is not modified, its existing value will remain unchanged. For example:

**Before**

<pre><code><strong>line 1
</strong>question: intent → answer: inquiry
question: priority → answer: low
question: status → answer: open

line 2
question: intent → answer: complaint
question: priority → answer: high
question: status → answer: open
</code></pre>

If both lines are selected and only the **priority** question is updated with the answer **medium**, the results after submission will look like this:

```
line 1
question: intent → answer: inquiry
question: priority → answer: medium
question: status → answer: open

line 2
question: intent → answer: complaint
question: priority → answer: medium
question: status → answer: open
```

## Row labeling

In row labeling projects, the **Search** extension lets you search table data to quickly find specific information across multiple rows and columns by selecting a search target, word filter, and keyword.

<figure><img src="/files/mPiaURlodBIskbtO31kF" alt=""><figcaption></figcaption></figure>

**Search target**

Specify the focus of the search. The available options are:

* <mark style="color:red;">`Text`</mark>: Matches the words or content in the data column.
* <mark style="color:red;">`Label`</mark>: Matches the words or content in the answer column.

**Word filters**

Define how their search keywords are matched to results. The available options are:

* **Contains any word:** Matches results that contain any of the specified words.
  * Example: Searching for <mark style="color:red;">`men`</mark> will match with <mark style="color:red;">`men`</mark>, <mark style="color:red;">`mentioned`</mark>, <mark style="color:red;">`abandonment`</mark>.
* **Exact word:** Displays only exact matches for the search keyword.
  * Example: Searching for <mark style="color:red;">`men`</mark> will match <mark style="color:red;">`men`</mark> but not <mark style="color:red;">`mentioned`</mark>.
* **Regex:** Search using regular expressions for advanced pattern matching.
  * Example: Searching for <mark style="color:red;">`men*`</mark> will match words starting with <mark style="color:red;">`men`</mark>, such as <mark style="color:red;">`mentioned`</mark>.

## Search all files

The **Search all files** option lets you search across all files within a project. When this option is **checked**, the search will include results from every file in the project. If the option is **unchecked**, the search will be limited to the current file only.

This is useful when performing a broad search across all files or focusing on a specific file within the project.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.datasaur.ai/advanced/extensions/search.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
