Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
      • Dropbox
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • How It Works
  • Recommended Steps
  • PCW Payload
  • Assignment
  • Tagging Projects
  • ML-Assisted Labeling
  1. Integrations
  2. Robosaur
  3. Commands

Create Projects

Last updated 2 years ago

How It Works

$ npm run start -- create-projects -h
Usage: robosaur create-projects [options] <configFile>

Create Datasaur projects based on the given config file

Options:
  --dry-run      Simulates what the script is doing without creating the projects
  --without-pcw  Use legacy Robosaur configuration (default: false)
  --use-pcw      Use the payload from Project Creation Wizard in Datasaur UI (default: true)
  -h, --help     display help for command
  • Robosaur will try to create a project for each folder inside the create.files folder. If the contents of quickstart/token-based/documents looks like the example below, Robosaur will create two projects named Project 1 and Project 2 with each project has one document named lorem.txt and ipsum.txt respectively. This attribute could be a path to your local drive or any supported object storage, the details can be seen .

    $ ls -lR quickstart/token-based/documents
    total 0
    drwxr-xr-x  3 user  group  Project 1
    drwxr-xr-x  3 user  group  Project 2
    
    quickstart/token-based/documents/Project 1:
    total 8
    -rw-r--r--  1 user  group  lorem.txt
    
    quickstart/token-based/documents/Project 2:
    total 8
    -rw-r--r--  1 user  group  ipsum.txt
  • All successful project creation is saved on the state that is configured by projectState attribute. So, the next time you run the same command, there will be no project duplication. It will only process the new project(s) or the failed ones.

Recommended Steps

  1. Select a configuration example from the quickstart folder.

  2. Specify the create.files value. As mentioned above, this attribute will be the data source of the projects.

  3. Create a new project using the Project Creation Wizard (PCW) by clicking the + Custom Project.

  4. Copy the values.

  5. Run the command.

PCW Payload

  1. Directly on the configuration file which is the recommended approach. Paste the payload to create.pcwPayload and make sure the value of create.pcwPayloadSource is like the example below.

    {
      ...
      "create": {
        ...
        "pcwPayloadSource": { "source": "inline" },
        "pcwPayload": <paste the values from PCW>
      }
      ...
    }
{
  ...
  "credentials": {
    "gcs": { "gcsCredentialJson": "<path-to-JSON-service-account-credential>" }
  },
  "create": {
    ...
    "pcwPayloadSource": {
      "source": "gcs",
      "bucketName": "my-bucket-name"
    },
    "pcwPayload": <path-to-the-payload-in-JSON-file>
  }
  ...
}

Assignment

List of Assignees (Labelers and Reviewers)

There are two ways to specify the list.

  1. Using the labelers and reviewers that are already assigned on PCW. This is the default approach and you won't have to do a thing because it's already included on the configuration when you paste it from PCW.

  2. Specify the list on your own. Create a file and specify the path on create.assignment attribute. The values of the file should be like this below.

    • If useTeamMemberId is true, fill both labelers and reviewers with teamMembeId.

    • If useTeamMemberId is false, fill both labelers and reviewers with their emails.

    {
      "labelers": [...], // list of emails
      "reviewers": [...], // list of emails
      "useTeamMemberId": false
    }

Distribution

Currently, we are supporting two assignment distributions.

  1. Across documents (default approach). You would only need to specify create.pcwAssignmentStrategy value. Here is the supported approach.

    • AUTO: distribute documents to labelers using round-robin algorithm, i.e. each document will only be assigned by exactly one labeler.

    • ALL: labelers will be assigned to all documents.

    Please note that the reviewers will be assigned to all projects and documents.

  2. {
      ...
      "create": {
        ...
        "assignment": {
          "source": "local",
          "path": "quickstart/token-based/config/assignment.json",
          "by": "PROJECT",
          "strategy": "AUTO"
        },
        // remove pcwAssignmentStrategy
        // remove documentAssignments from pcwPayload
        ...
      }
    }
    1. Create the assignment file and specify it on create.assignment.

    2. Fill project as the value of create.assignment.by attribute.

    3. Select assignment strategy by filling the create.assignment.strategy. There are two ways supported.

      1. AUTO: distribute both labelers and reviewers using round-robin. Only one labeler and reviewer for each project.

      2. ALL: all reviewers and labelers will be assigned to each project.

    4. Remove create.pcwAssignmentStrategy attribute and documentAssignments attribute from pcwPayload.

Tagging Projects

Newly created projects from Robosaur can be tagged automatically.

{
  ...
  "create": {
    ...
    "pcwPayload": {
      ...
      "variables": {
        ...
        "input": {
          ...
          "tagNames": ["TAG 1", "TAG 2"]
        }
      }
    }
  }
  ...
}

Or, if the PCW Payload is on an external file (whether it is local or from a cloud storage), add the tagNames field in variables.input, and specify the tags for the projects.

{
  ...
  "variables": {
    ...
    "input": {
      ...
      "tagNames": ["TAG 1", "TAG 2"]
    }
  }
}

ML-Assisted Labeling

Automate the labeling process on the newly created projects using ML-assisted labeling.

In the config file, add the autoLabel field under create and fill in the required fields. The target API requires the project to have a label set to be able to work properly.

{
  ...
  "create": {
    ...
    "autoLabel": {
      "enableAutoLabel": true,
      "labelerEmail": "<EMAIL>", // use your Datasaur's account email
      "targetApiEndpoint": "<API_ENDPOINT>", // your custom API model
      "targetApiSecretKey": "<API_SECRET>", // if needed
      "numberOfFilesPerRequest": 1
    }
  }
  ...
}

With this, every time a project is created, the ML-assisted labeling will be triggered and there will be labels applied on the new project, depending on your custom API model response.

Open the and select your preferred team to work on by clicking your profile on the top right corner.

Configure what kind of projects that you want to automate. Go through until the last step, including choosing labelers and reviewers, and click <> View Script in the top right corner (see to help visualize the step).

Paste the value directly to create.pcwPayload and make sure the create.pcwPayloadSource value is properly filled. See the detailed .

Specify the pcwAssignmentStrategy. The value could be ALL (default) or AUTO. See the detailed .

Use a storage (could be local file or any supported cloud storage). Below is the example using GCS. Paste the value to a JSON file in your bucket and fill create.pcwPayload with the path. Another attributes that must be filled are create.pcwPayloadSource and credentials. For other supported object storage, see .

Across projects. To use this approach, you would have to specify the labelers and reviewers list on your own just like mentioned on the section. Follow the steps below.

From the PCW payload that you have copied using the recommended approach from the (directly on the config file), add a new field called tagNames under create.pcwPayload.variables.input, and specify the tags for the projects. If the tags did not exist yet, they will be created for you.

here
app
this video
here
below
below
List of Assignees
previous section