For the complete documentation index, see llms.txt. This page is also available as Markdown.

Create Projects

How it works

$ npm run start -- create-projects -h
Usage: robosaur create-projects [options] <configFile>

Create Datasaur projects based on the given config file

Options:
  --dry-run      Simulates what the script is doing without creating the projects
  --without-pcw  Use legacy Robosaur configuration (default: false)
  --use-pcw      Use the payload from Project Creation Wizard in Datasaur UI (default: true)
  -h, --help     display help for command

Robosaur creates a project for each folder inside the create.files directory.

For example, if quickstart/token-based/documents contains the structure below, Robosaur creates:

  • Project 1 with 1 document: lorem.txt

  • Project 2 with 1 document: ipsum.txt

This attribute can point to either a local directory or any supported object storage provider.

$ ls -lR quickstart/token-based/documents
total 0
drwxr-xr-x  3 user  group  Project 1
drwxr-xr-x  3 user  group  Project 2

quickstart/token-based/documents/Project 1:
total 8
-rw-r--r--  1 user  group  lorem.txt

quickstart/token-based/documents/Project 2:
total 8
-rw-r--r--  1 user  group  ipsum.txt

All successfully created projects are tracked in the state file configured through the projectState attribute. When you run the same command again, Robosaur skips projects that were already created successfully to prevent duplication. Only new or previously failed projects are processed.

  1. Select a configuration example from the quickstart folder.

  2. Specify the create.files value. This attribute defines the data source for the projects.

  3. Go to Datasaur and select the workspace you want to use from the profile menu in the top-right corner.

  4. Clik Create project.

  5. Configure the project settings you want to automate. Complete all steps in the wizard, including assigning labelers and reviewers.

  6. On the final step, click View script in the top-right corner.

    View script
  7. Copy the generated values.

  8. Paste the value directly to create.pcwPayload and make sure the create.pcwPayloadSource value is properly filled. Learn more in PCW payload.

  9. Specify the pcwAssignmentStrategy. The value could be ALL (default) or AUTO. Learn more in Distribution.

  10. Run the command.

PCW Payload

You can configure the PCW payload in two ways.

  • Inline payload.

  • External storage.

Paste the payload directly into the configuration file using create.pcwPayload. Make sure create.pcwPayloadSource is set to inline.

External storage

You can also store the payload in:

  • A local file.

  • Any supported cloud object storage provider.

The example below uses Google Cloud Storage (GCS). Paste the value to a JSON file in your bucket and fill create.pcwPayload with the path. Another attributes that must be filled are create.pcwPayloadSource and credentials. For other supported object storage, see here.

Assignment

List of assignees

There are two ways to define the list of labelers and reviewers.

  1. Use assignees from the PCW payload (default). By default, Robosaur uses the labelers and reviewers already configured in the project creation wizard (PCW). No additional setup is required because the assignees are included automatically in the copied PCW payload.

  2. Specify assignees manually. Create a file and specify the path on create.assignment attribute. Configuration rules:

    • If useTeamMemberId is true, fill both labelers and reviewers with teamMemberId.

    • If useTeamMemberId is false, fill both labelers and reviewers with email addresses.

Distribution

Robosaur currently supports two assignment distribution methods.

Across documents (default)

This approach distributes assignments across documents within a project. To use it, set the create.pcwAssignmentStrategy value. Supported strategies:

  • AUTO: Distributes documents to labelers using a round-robin algorithm. Each document is assigned to exactly one labeler.

  • ALL: Assigns all labelers to all documents.

Reviewers are always assigned to all projects and documents.

Across projects

This approach distributes assignments across projects instead of documents. To use this approach:

  1. Create a custom assignment file as described in the List of Assignees section.

  2. Create the assignment file and specify it on create.assignment.

  3. Set the value of create.assignment.by to PROJECT.

  4. Set the value of create.assignment.strategy to AUTO or ALL.

    1. AUTO: Distributes both labelers and reviewers using a round-robin algorithm. Each project is assigned to exactly one labeler and one reviewer.

    2. ALL: Assigns all labelers and reviewers to every project.

  5. Remove create.pcwAssignmentStrategy attribute and documentAssignments attribute from pcwPayload.

Example:

Projects tags

Robosaur can automatically apply tags to newly created projects.

From the PCW payload that you copied using the recommended approach from the previous section (directly in the configuration file), add a new field called tagNames under create.pcwPayload.variables.input and specify the tags for the projects. If a tag does not already exist, it will be created automatically.

If the PCW payload is stored in an external file (whether it is local or from a cloud storage), add the tagNames field under variables.input, and specify the tags for the projects.

ML-assisted labeling

You can automate labeling for newly created projects using ML-assisted labeling.

Add the autoLabel field under create in the configuration file and fill in the required values. The target API requires the project to already have a label set configured.

After configuration, ML-assisted labeling automatically runs whenever a new project is created. Labels are applied based on the response from your custom API model.

Last updated