Rating
Overview
The Rating evaluation project helps you assess the quality of your LLM outputs using human judgment, by rating and correcting the generated completions.
Prerequisites
In Rating projects, you can evaluate two types of completions:
Pre-generated completions
Completions generated by models from Sandbox
Evaluate pre-generated completions
There are two CSV formats for pre-generated completions:
Two column CSV format:
prompt
andcompletion
.Four column CSV format:
prompt_template
,prompt
,sources
,completion
.
Evaluate models from Sandbox
Ensure the model is deployed or saved to library.
Prepare a dataset in a CSV file with one column:
prompt
.
Create project
To create Rating evaluation projects:
Navigate to the Evaluation page under LLM Labs menu.
Click Create evaluation project, select Rating, then Continue.

Set up your project. Choose what you want to evaluate with:
Evaluate pre-generated completions
Upload the dataset in a CSV file with two columns:
prompt
andcompletion
.
Evaluate models from Sandbox
Upload the dataset in a CSV file with one column:
prompt
.
Click Create evaluation project.
Evaluate completions
Open the project to evaluate the generated completions. You should rate each completion of a prompt from 1 to 5 stars. A 5-star rating usually means the completion is already perfect, so there is no need to provide feedback or edit the completion.

When the rating is below 5 stars, you have to refine the completion by providing your expected completion. After that, submit the answer to move to the next prompt.

View evaluation results
After evaluating all completions, mark the evaluation as complete from the app bar. Click the current status Evaluation in progress and change it to Evaluation completed.

After the evaluation is marked as complete, you can view the summary of the evaluation. For evaluating models from Sandbox, you can see:
Average cost and processing time for generating completions
Average evaluation score
Evaluation results in a table view

For evaluating pre-generated completions, you can see:
Average evaluation score
Evaluation results in a table view

Last updated