Rating

Overview

The Rating evaluation project helps you assess the quality of your LLM outputs using human judgment, by rating and correcting the generated completions.

Prerequisites

In Rating projects, you can evaluate two types of completions:

  1. Pre-generated completions

  2. Completions generated by models from Sandbox

Evaluate pre-generated completions

There are two CSV formats for pre-generated completions:

  1. Two column CSV format: prompt and completion.

  2. Four column CSV format: prompt_template, prompt, sources, completion.

Evaluate models from Sandbox

  1. Ensure the model is deployed or saved to library.

  2. Prepare a dataset in a CSV file with one column: prompt.

Create project

To create Rating evaluation projects:

  1. Navigate to the Evaluation page under LLM Labs menu.

  2. Click Create evaluation project, select Rating, then Continue.

  1. Set up your project. Choose what you want to evaluate with:

    1. Evaluate pre-generated completions

      1. Upload the dataset in a CSV file with two columns: prompt and completion.

    2. Evaluate models from Sandbox

      1. Upload the dataset in a CSV file with one column: prompt.

      2. Select the model that you want to use to generate completions. If you can’t find your model in the list, go to the Sandbox where your model is created, and deploy or save to library. You can only evaluate deployed or saved models.

  2. Click Create evaluation project.

Evaluate completions

Open the project to evaluate the generated completions. You should rate each completion of a prompt from 1 to 5 stars. A 5-star rating usually means the completion is already perfect, so there is no need to provide feedback or edit the completion.

Labeler mode

When the rating is below 5 stars, you have to refine the completion by providing your expected completion. After that, submit the answer to move to the next prompt.

View evaluation results

After evaluating all completions, mark the evaluation as complete from the app bar. Click the current status Evaluation in progress and change it to Evaluation completed.

After the evaluation is marked as complete, you can view the summary of the evaluation. For evaluating models from Sandbox, you can see:

  • Average cost and processing time for generating completions

  • Average evaluation score

  • Evaluation results in a table view

For evaluating pre-generated completions, you can see:

  • Average evaluation score

  • Evaluation results in a table view

Last updated