OCR Labeling
OCR (Optical Character Recognition) labeling is part of span labeling, allowing you to label spans while viewing the original document as a reference. In OCR projects, you can also draw bounding boxes on the media file (.pdf, .tif, .jpg, .png, .gif, .docx, .pptx), link these bounding boxes to the transcription text, and view the text and the media file side-by-side.
OCR projects are helpful when working with image or document files that contain text. You can perform span labeling tasks like named entity recognition (NER) and part-of-speech (POS) tagging on the transcribed text. Additionally, you can annotate specific areas of the media files and link it to the corresponding text.

Interface
The layout of an OCR project is split into two sections:
Document viewer - Displays media files on the left.
Text viewer - Displays the transcription text on the right.
You can shrink/enlarge one of the viewers by clicking and dragging the resize handler between the two viewers. You also can hide one of the viewers by clicking one of the arrows in the resize handler. This way, you can hide the document viewer if you wish to focus on labeling the text.

You’ll also notice a control panel at the top of the document viewer. This control panel includes:
Zoom in and out buttons
Rotate counterclockwise or clockwise buttons
A page indicator showing the current page number
Draw bounding box button

Labeling in OCR project
Applying span labels in the text viewer works the same way as in span labeling. What makes an OCR project different is the capability to draw bounding boxes, and link those boxes to spans of text.
How to link the bounding box to the text?
Click the rightmost icon in the control panel to enable drawing mode. The icon turns blue when drawing mode is enabled.

Click and drag in the document viewer to draw a bounding box. Once you release the cursor, the bounding box appears, and a tooltip prompts you to select the corresponding text.

With the bounding box selected, highlight a span of text in the text viewer to link it.

To link the bounding box to different text, right-click the bounding box and select Edit sentence position. The box will be highlighted again, and you can select new corresponding text in the text viewer.

To adjust or resize a bounding box, click the bounding box to display the resize handles. Then drag the resize handles to adjust its size.

Reviewing labels
Similar to span labels, bounding boxes can also cause conflicts in reviewer mode.

Conflicting bounding box labels have red lines and cannot be resized. Right-click to accept or reject them.

Bounding box labels that have reached the consensus have grey lines and can be resized.
Bounding box labels applied by reviewers have purple lines and can be resized.
Clicking a conflicting, consensus, or reviewer-applied bounding box will highlight the corresponding text. Likewise, clicking a labeled span of text will highlight the corresponding bounding box.
Last updated