New mutation (createProject)

The API

We have updated our Python script example to send a request to the new mutation. This new version is available under the v2.0 version.

Differences between launchTextProjectAsync and createProject mutation

Beside the difference in input structure - LaunchProjectInput vs LaunchTextProjectInput, there is one big difference in how we handle file uploads. Previously, file uploads are handled directly in the GraphQL mutation using file mapping - this seems to cause some difficulty in calling the mutation from some clients, such as Postman, as well as having a limited size and timeout limit.

File upload

The new mutation now does not handle file uploads at all. Instead, Datasaur now has a separate REST endpoint that will accept file uploads. This new endpoint is accessible at https://upload.datasaur.ai/api/static/proxy/upload and require the same Authorization header as the GraphQL endpoints. Here is a sample cURL request to upload a file, and the corresponding response:

$ curl --location 'https://upload.datasaur.ai/api/static/proxy/upload' \
--header 'Authorization: Bearer ' \
--form 'file=@"/path/to/file"'

{"objectKey": "temp/upload-proxy/<unique-id>.<file-extension>"}

You could then use the objectKey as the value for the GraphQL DocumentDetailInput.objectKey. Below is an example document payload, along with some explanations.

{
  "documents": [
    "document": {
      "name": "sample.pdf",
      "objectKey": "temp/upload-proxy/some-sample-filename.pdf"
    },
    "extras": [
      {
        "name": "transcription.txt",
        "externalUrl": "https://gist.githubusercontent.com/ivanm-gdp/3c2eeef9c8a124b628b1ee42fcaa07a3/raw/9d2d3c1599634764e4139defc90cd9b4502566c0/sample.txt"
      }
    ]
  ]
}
  1. documents is an array of CreateDocumentInputs object. The full reference for CreateDocumentInput is available here

  2. document and extras uses the same structure: DocumentDetailInput

    1. each DocumentDetailInput should have name and one of externalUrl and objectKey populated.

      1. To obtain objectKey, please refer to the previous section

      2. externalUrl is any reachable URLs that serve the document directly. A good example would be Github’s raw gist link for text files, and signed URLs from object storage provides. Datasaur would download the file and create a copy to use in the created project.

    2. document is where you should generally put your main file to be labeled. Most of the time, only this field would be populated. See our supported format page for more information on which file types are supported.

    3. extras is used for certain types of labeling. One example is for transcription-based project, such as ASR or OCR projects, the main document would be the audio recording / image / document, and the accompanying transcription file should be added inside the extras array.

Migrating to the new mutation

If you are able to, migrating to use the new mutation would be as simple as running through our Project Creation Wizard once again, and clicking View Script at the last step. We have provided the new payload structure there.

You can then follow our Python example to create a project with the chosen settings.

Last updated