Reviewing NLP Projects
Last updated
Last updated
The “Reviewer Mode” is designed to facilitate efficient and effective oversight of the labeling process. As a reviewer, your role involves ensuring the accuracy and consistency of labeled data while maintaining a smooth workflow for labelers. This mode provides you with the tools and insights you need to uphold the quality standards of your project.
You must have the Reviewer role first to use the Reviewer Mode. Roles in Datasaur can be viewed at the following link.
You can see how conflicts in token labeling look. We have three types of conflicts in token labeling:
Contents conflict
Spans conflict
Arrows conflict
For more information about the difference between the three types of conflict, please refer to this link.
You can also hover over the conflicting label between two labelers and choose the best label answer by clicking the label.
You can also Go to the next conflict and the previous conflict in the Go toolbar. Or by clicking Alt+Shift+Right
for going to the next conflict and Alt+Shift+Left
for going to the previous conflict.
We also differentiate the label color based on the label's status
Your label color will be gray when your label has already reached consensus among the labelers.
Your label color will be yellow when you use our Assisted Labeling Functionality; it will show you that this label comes from the Assisted Labeling Extension by making the color yellow.
Your label color will be blue when your labeler and reviewer have different answers; it will show you that this label has an incorrect or rejected status by the reviewer.
Your label color will be purple when your reviewer labels the token; it will show you that the label is labeled by the reviewer.
Your label color will be red when the status is unresolved or conflicted with another labeler; it will show you that the label is unresolved or conflicted.
In token labeling, you can also see the number of token labels applied, the last labeled row, and the total solved rows. You can see it in the lower-right corner of the table display
Unlike token labeling, the reviewing process in row labeling involves accepting answers within the Document and Row Labeling extension. When reviewing a row labeling project, there are two primary things:
Line color
White color: Rows containing a consensus or those already resolved by the reviewer.
Red color: Rows without any consensus
Blue color: Selected rows.
Answer in the table
Submitting answers in the Document and Row labeling extension will trigger the display of answers in the table.
Empty answers in the table
No consensus
Answers displayed in the table
Meet consensus
Mix consensus and conflict
Only display the consensus answers. The conflicted answer will be displayed after the reviewer resolves it.
Resolved answers by a reviewer
Consensus rows
Answers are in shown blue-colored labels
Answers are selected in the question field
Mix consensus and conflict rows
Conflict answers are shown in the red-colored labels
Consensus answers are shown in blue-colored labels and selected
In essence, the behavior is similar to row labeling, but let's delve into the specifics of the Document labeling extension. ✨
We are planning to enable the review feature for bounding box labeling in the near future. It’s coming soon!
The “Reviewer Mode” is designed to facilitate efficient and effective oversight of the labeling process. As a reviewer, your role involves ensuring the accuracy and consistency of labeled data while maintaining a smooth workflow for labelers. This mode provides you with the tools and insights you need to uphold the quality standards of your project.
Reviewer can input their own rating and answer by click the Edit button and edit the answer (and rating) the click the submit button on the bottom.
Reviewer also able to select the labelers’ answer as shown on the image below. Click the submit button after you select one of the labelers’ answer.
When you click “Mark Project as Complete” in the LLM Evaluation project on Reviewer mode, a window appears. This window includes the LLM Evaluation Report Card and a prompt to download the 'LLM Evaluation Format.csv'. The calculations within this report are derived from the reviewers' answers.
The LLM Evaluation Report Card consists of:
Average Rating Score (0.00 - 1.00): This score is the average of all prompts rated across all files. Datasaur rounds the LLM Evaluation Score: values equal to or greater than 0.5 are rounded up, while values less than 0.5 are rounded down.
Rating Distribution Bar Chart: This chart visualizes the distribution of 1–5 star ratings and includes a section for unrated items.
Unrated Definition: “Unrated” refers to prompts that have not received any reviewers' answers.
When you export all files, a new file named 'llm-evaluation-report-card.csv' will be created, containing all of the information above.
The “Reviewer Mode” is designed to facilitate efficient and effective oversight of the labeling process. As a reviewer, your role involves ensuring the accuracy and consistency of labeled data while maintaining a smooth workflow for labelers. This mode provides you with the tools and insights you need to uphold the quality standards of your project.
Reviewer can input their own ranking then click the submit button on the bottom.
Reviewer also able to select the labelers’ answer as shown on the image below. Click the submit button after you select one of the labelers’ answer.
Yellow color: Rows containing both a consensus answer and a conflicting answer. This occurs when a question has multiple values enabled.