> For the complete documentation index, see [llms.txt](https://docs.datasaur.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.datasaur.ai/integrations/robosaur.md).

# Robosaur

Robosaur supports multiple commands. Each command needs a JSON file that will act as the configuration. For example, project creation will need information regarding what kind of projects that will be created.

Robosaur also uses a state file to track previously executed commands. This ensures the process can resume safely without duplication. For example, already successfully created projects will not be processed again in the next run, while failed or new ones will be retried.

## Installation

The [source code](https://github.com/datasaur-ai/robosaur) for Robosaur is available publicly as an open-source [GitHub](https://github.com/datasaur-ai/robosaur) project. Use the following command to clone the source code.

```
git clone https://github.com/datasaur-ai/robosaur.git
cd robosaur
nvm use
npm ci
```

Robosaur is developed using TypeScript and Node.js. We highly recommend using [nvm](https://github.com/nvm-sh/nvm) to manage the versions.

* [Node.js](https://nodejs.org/en/) v16.13.
* npm v8 (should be bundled with Node.js).

## First-time configuration

Before running any Robosaur commands, get familiar with our [project types](/data-studio-projects/nlp-task-types.md), and configure the following:

1. Open `/quickstart/{preferred-project-type}/config/config.json`.
2. [GenerateOAuth credentials](/api/credentials.md#generate-oauth-credentials-menu) and replace `<DATASAUR_CLIENT_ID>` and `<DATASAUR_CLIENT_SECRET>`.
3. Open <https://app.datasaur.ai/projects>. Click your profile on the top right corner and select the team that you want to use. Grab the team ID from the URL (`https://app.datasaur.ai/teams/{team-id}/projects`) and replace the `<TEAM_ID>` values on your configuration.

## Supported commands

### Create multiple projects

```
npm run start -- create-projects <path-to-configuration-file>
```

### Export multiple projects

```
npm run start -- export-projects <path-to-configuration-file>
```

### Apply tags

```
npm run start -- apply-tags <path-to-configuration-file>
```

### Split document

```
npm run start -- split-document <path-to-configuration-file>
```

Each command uses a dedicated configuration file to define its behavior. See the next section for detailed configuration options.

## Configuration file

Robosaur requires a configuration file to define how each command behaves.

### Required fields (all commands)

These attributes are required for all commands.

* `datasaur`: specifies the host, clientId, and clientSecret to authenticate the each call.
* `projectState`: specifies where to store the state files. If Robosaur is used by multiple users, make sure to save the state at cloud object storage to keep it properly synced.

### Command-specific fields

For each command, you need to fill the attributes accordingly.

* [create](/integrations/robosaur/commands/create-projects.md) specifies the project settings to be created.
* [export](/integrations/robosaur/commands/export-projects.md) specifies the behavior for exporting projects.
* [applyTags](/integrations/robosaur/commands/apply-project-tags.md) specifies the applying tags behavior.
* [splitDocument](/integrations/robosaur/commands/split-document.md) specifies the behavior for splitting a file.

A full breakdown of each attribute is available as a TypeScript file in [`src/config/interfaces.ts`](https://github.com/datasaur-ai/robosaur/blob/master/src/config/interfaces.ts) .

## Object storage

Each `source` attribute could be configured to local, AWS S3, Google Cloud Storage, or Azure Blob Storage. The detailed guide could be found [here](/integrations/robosaur/storage-options.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/integrations/robosaur.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
