Search

The Search extension helps users quickly find specific words, phrases, or labeled tokens within their data. It’s useful for navigating both individual documents and entire projects, with features like label-specific searches, regex searches, and exact word matching. Results are clearly displayed in a list, making it easier to analyze and work with large datasets efficiently.

Span Labeling

In a Span Labeling project, two types of searches are available: Standard and Advanced.

Standard Search

The Standard Search allows users to perform simple searches based on text and labels using keywords or regular expressions (regex). This search type is intuitive and provides quick access to relevant data by matching the input with the text or labels in the project.

Search Based on Text

Text-based search allows users to search for specific words or patterns within the data by specifying a word filter and entering a keyword to locate matching text in the project.

Word Filter

This option lets users define how their search keywords are matched to results. The available options are:

Contains any word: Matches results that contain any of the specified words.
- Example: Searching for men will match with men, mentioned, abandonment.
Exact word: Displays only exact matches for the search keyword.
- Example: Searching for men will match men but not mentioned.
Regex: Allows users to search using regular expressions for advanced pattern matching.
- Example: Searching for men* will match words starting with men, such as mentioned.

Search Based on Label

Label-based search allows users to find specific labels or categories in the data, with the word filter set to "Contains any word.”

Advanced Search

The Advanced Search provides a more sophisticated way to search by allowing the combination of multiple conditions to refine results. This search type supports complex queries using MongoDB query syntax.

Configure Conditions

There are two ways to configure the search conditions:

Logic Builder: A user-friendly interface to create conditions visually.
Query: Directly input advanced queries for complex conditions.

Configure Conditions via Logic Builder

Users can create searches with multiple conditions, where each condition includes a search target, a filter operation, and a keyword. These conditions can be combined using logical operators such as "OR" or "AND" to define the relationship between the conditions.

Search target:

Text: Matches words or content in the spans.
Label: Matches the labels applied to the text.
Metadata: Matches information attached to the line (in key-value pair).

Filter operation:

is: Matches search target that exactly matches the specified keyword.
is not: Matches search target that explicitly does not match the specified keyword.
contains: Matches search target that contains the specified keyword.
does not contain: Matches search target that does not contain the specified keyword
matches regex: Matches search target that fits the regular expression pattern.

Keyword:

For Text and Label, this is the word or phrase to match.
For Metadata, this is the key: value pair used to filter information.

Logical operator:

OR: Matches results that meet at least one condition.
AND: Matches results that meet all conditions.

Sub-condition: An additional filter available when the Label search target is selected. It allows users to narrow results based on who applied or reviewed the label.

Sub-conditions are available only in Reviewer Mode.

When using sub-conditions, users can combine multiple filters using logical operators such as AND and OR, just like parent conditions.
Sub-condition filter operations:
- is applied by: Matches labels that were applied by the selected user, or by any user if “any labeler” is selected.
- is not applied by: Matches labels that were not applied by the selected user, or by any user if “any labeler” is selected.
- is reviewed by: Matches labels that were reviewed by the selected user, or by any user if “any reviewer” is selected (both accepted and rejected).
- is not reviewed by: Matches labels that were not reviewed by the selected user, or by any user if “any reviewer” is selected (both accepted and rejected).

Configure Conditions via Query

You can set up conditions using MongoDB queries. Datasaur supports a subset of the MongoDB query selectors, which are listed below.

Key Operators

$regex — Search for text patterns (combine with $options for behavior like case-insensitivity using $options: "i").
$not — Exclude matches.
$or — Requires at least one condition to match.
$and — Requires all conditions to match.

Notes: $or and $and require exactly 2 conditions.

Search Condition

Text condition — Searches for text content.
1. First example: Find text containing "is". This will match text like "This is used to train data."
  { "cellFragment.content": { "$regex": "is", "$options": "i" } }
2. Second example: Find text not containing "is". This will match text like "Labeling the data will be done in Datasaur."
  { "cellFragment.content": { "$not": { "$regex": "is", "$options": "i" } } }
Label condition — Filters based on labeled spans.
1. First example: Find spans labeled with a label containing "GEO". This will match labels like “GEO”, “Location Geo”, and “Geospatial Data”.
  { "spanLabels": { "$elemMatch": { "labelClassName": { "$regex": "GEO", "$options": "i" } } } }
2. Second example: Find spans labeled exactly with "GEO". This will match labels like "GEO", "geo", or any other case variations, but the entire label must be "GEO" with no additional characters.
  { "spanLabels": { "$elemMatch": { "labelClassName": { "$regex": "^GEO$", "$options": "i" } } } }
3. Third example: Find spans labeled as “GEO” by the labeler with user ID 1 (Number 1 represents the user ID of the label contributors. These IDs are unique identifiers used to construct the query conditions). See the guide below for how to obtain the user ID.
  { "spanLabels": { "$elemMatch": { "labelClassName": { "$regex": "^GEO$", "$options": "i" }, "appliedByUserIds": { "$in": [ "1" ] } } } }
4. Fourth example: Find spans labeled as “GEO” that have been reviewed by reviewer with user ID 2 (Number 2 represents the user ID of the label contributors. These IDs are unique identifiers used to construct the query conditions). See the guide below for how to obtain the user ID.
  { "spanLabels": { "$elemMatch": { "labelClassName": { "$regex": "^GEO$", "$options": "i" }, "reviewedByUserIds": { "$in": [ "2" ] } } } }
💡 Guide to determine a contributor's user ID, use the Logic Builder:
- Navigate to the Configure Conditions via Logic Builder section.
- Add a sub-condition that filters by Labeler or Reviewer.
- Select the desired user from the dropdown list.
- Save the conditions. The corresponding user ID will be automatically populated in the generated query.

Metadata condition — Searches for key-value pairs in the metadata attached to each line.

Example: Find metadata where the key is "category" and the value is "education".

{
  "cellFragment.metadata": {
    "$elemMatch": {
      "key": {
        "$regex": "category",
        "$options": "i"
      },
      "value": {
        "$regex": "education",
        "$options": "i"
      }
    }
  }
}

Logical OR condition — Matches if any of the conditions are true.

Example: Find text containing either "France" or "John".

{
  "$or": [
    { 
	    "cellFragment.content": { 
		    "$regex": "France", 
		    "$options": "i" 
	    }
    },
    { 
	    "cellFragment.content": { 
		    "$regex": "John", "$options": "i" 
	    }
    }
  ]
}

Logical AND condition — Matches only if all conditions are true.

Example: Find text containing both "France" and "John".

{
  "$and": [
    { 
	    "cellFragment.content": { 
		    "$regex": "France", 
		    "$options": "i" 
	    }
    },
    { 
	    "cellFragment.content": { 
		    "$regex": "John", "$options": "i" 
	    }
    }
  ]
}

Search Result

The search operates at the line level, meaning it evaluates each line individually against the list of specified conditions.

In Reviewer Mode, search results include not only reviewed labels (accepted and consensus), but also conflicted and rejected labels.

💡For conditions with negative operators (is not, does not contain), only the lines that meet the specified conditions will be displayed in the results.

To enhance readability, users can enable the "Show only matching lines in the text viewer" option. When activated, this option hides non-matching lines, allowing users to focus only on relevant results in the text viewer.

Saved Search

Saved Search enables storing custom search configurations for reuse. Once saved, the same search can be applied without the need to manually reconfigure the conditions.

Saving a Search

Saving a search can be done in both Labeler and Reviewer modes. A search saved in one mode will also be accessible in the other.

Once you have a search configured, open the Search extension menu. You should see an option called “Save search configuration.”

Clicking this option will open a dialog where you can enter a name and description for the saved search, along with a preview of the current search configuration.

Using a Saved Search

There are two ways to access the saved search:

Open the extension menu and select “Manage saved searches.”
If advanced search type is selected, opening the “Configure” button’s split menu will display “Use existing saved search.”

Both methods open a dialog displaying all saved searches. Selecting one shows its details on the right, and searches can also be filtered by name using the search bar. Click “Use saved search” button to apply the selected configuration to the current session.

Editing or Deleting a Saved Search

In the “Manage saved searches” dialog, select the saved search you want to edit or delete. Two buttons: “Edit” and “Delete” will appear next to the saved search name.

Edit

While editing a saved search, you cannot navigate to or preview another saved search.

Clicking the “Edit” button will open an interface where you can update the name and description. Once the changes are made, click the “Save Changes” button to apply them.

Delete

Clicking the “Delete” button will open a confirmation dialog. Confirm the action to permanently delete the saved search.

Row Labeling

Allows users to search within the data of the table, enabling them to find specific information across multiple rows and columns quickly by specifying the search target, word filter, and entering the keyword.

Search target

This option allows users to specify the focus of the search. The available options are:

Text: Matches the words or content in the data column.
Label: Matches the words or content in the answer column.

Word Filter

This option lets users define how their search keywords are matched to results. The available options are:

Contains any word: Matches results that contain any of the specified words.
- Example: Searching for men will match with men, mentioned, abandonment.
Exact word: Displays only exact matches for the search keyword.
- Example: Searching for men will match men but not mentioned.
Regex: Allows users to search using regular expressions for advanced pattern matching.
- Example: Searching for men* will match words starting with men, such as mentioned.

Search All Files

The Search All Files option allows users to search across all files within a project. When this option is checked, the search will include results from every file in the project. If the option is unchecked, the search will be limited to the current file only.

This is useful for users who want to either perform a broad search across all files or focus on a specific file within the project.

Label All

Only available for Span labeling projects.

The Label All feature allows users to quickly label all matching results in the project.

For example, searching for the text james will show the number of instances of james in the document. After selecting PER from the dropdown, pressing the Label All button will apply the PER label to all instances of james in the document.

This feature is a useful tool for bulk labeling, making the process faster and more efficient. It is especially beneficial for projects that require detailed text analysis, enhancing accuracy and saving time.

Tips & Tricks To make it easier to navigate through the results, you can use the Up Arrow or Down Arrow keys to move to the next or previous result.

Last updated 11 days ago