Import Transformer
By using Import Transformer, you can import almost anything into Datasaur. Currently, we only accept files with .csv
, .txt
, and .json
extension.
Your new import transformer will have this template:
The Import Transformer is a function that takes the fileContent
in string, parsed using UTF-8 encoding, and return a SimpleDocument
that is understood by Datasaur.
SimpleDocument
is an object representation of a Document in Datasaur. It is a combined type that support token-based labeling and row-based labeling. Below is the structure of SimpleDocument
:
cells: an array of cells. Datasaur's document is stored in tabular structure. The cell represents a single cell in a table. For token-based projects, we only support a single column table at this moment. Each row/line of the document must have the same number of columns.
line: A zero-based number indicating the row
index: A zero-based number indicating the column.
For token-based projects, this value can only be set to 0.
content: The original content of a cell
tokens: A tokenized version of the content. This field is only used For token-based project only.
metadata: an optional array of key-value data to be stored per cell. You can find the structure and configuration options for metadata here.
key: Identifier for the metadata item, represented as a string. Example: "author"
value: Content or data of the metadata item, represented as a string. Example: "John Doe"
type: Optional field indicating the type of the value in MIME type.
Default: text/plain
Supported type:
text/plain: Displays metadata as plain text.
text/html: Displays metadata as HTML.
image/*: Displays metadata as an image. The supported image format will depend on the browser support.
audio/*: Displays metadata as an audio player. The supported audio format will depend on the browser support.
pinned: Boolean that specifies if the metadata should be visible at the top of each cell. Metadata that isn’t pinned will still be viewable through the Metadata Extension panel.
config: Customizes appearance for text/plain types.
color: Determine the text color of the metadata in string. Accepts any HTML color codes and names.
backgroundColor: Determine the background color of the metadata in string. Accepts any HTML color codes and names.
borderColor: Determine the border color of the metadata in string. Accepts any HTML color codes and names.
labels: an array of labels
common fields
id: a unique number to identify the label. To be referred by the arrow labels.
startCellLine: starting line position
startCellIndex: starting line column position
startTokenIndex: starting token index position, relative to cell
startCharIndex: starting character index position relative to token
endCellLine: ending line sentence position
endCellIndex: ending line column position
endTokenIndex: ending token index position, relative to cell
endCharIndex: ending character index position, relative to token
type: type of the labels. Accept one of these values:
"SPAN"
,"ARROW"
,"BOUNDING_BOX"
,"TIMESTAMP"
specific fields by its type:
"SPAN" or "ARROW"
labelSetIndex: replaces layer. Configures how the labelset items are grouped
labelName: replaces labelSetItemId. The text provided here will be displayed in web UI
"ARROW"
originId: id of a span label as the arrow's origin.
destinationId: id of a span label as the arrow's destination.
"BOUNDING_BOX"
pageIndex: page information for multiple page files, such as PDF and TIFF. Set field to 0 for common image formats, such as JPG, PNG, BMP, etc.
nodeCount: number of nodes, this is used for future support for polygons. Only support 4 nodes in rectangular shape for now.
x0: the first node's x value in screen coordinate system.
y0: the first node's y value in screen coordinate system.
x1: the second node's x value in screen coordinate system.
y1: the second node's y value in screen coordinate system.
x2: the third node's x value in screen coordinate system.
y2: the third node's y value in screen coordinate system.
x3: the fourth node's x value in screen coordinate system.
y3: the fourth node's y value in screen coordinate system.
"TIMESTAMP"
startTimestampMillis: the starting timestamp in millisecond.
endTimestampMillis: the ending timestamp in millisecond.
Sample Case
We want to label a subtitle file in .srt format and show the timestamp as metadata. The file transformer will be shown below.
Let's try this out!
The first step is, you have to rename the file by adding .txt. You can use the following sample file.
Click File Transformer, then copy and paste the script above.
After uploading the file, choose the Subtitle script on the Import File Transformer dropdown. Finish the project creation and launch the project.
Your project is ready!
Notes
You need to add Metadata extension to the project.
If you want to the metadata is readable in the text editor, set
pinned: true
.Use HTML code color for text color, border color, and background color.
If you have any questions, please reach out to support@datasaur.ai.
Last updated