Rating
Last updated
Last updated
The Rating evaluation feature provides a streamlined way to assess the quality of your Large Language Model (LLM) outputs. By leveraging human judgment, you can gain valuable insights into your model's strengths and weaknesses, ultimately guiding its improvement.
Rating evaluation focuses on evaluating individual LLM outputs (completions) against predefined criteria. This involves assigning a score to each completion, reflecting its quality in relation to a specific prompt and ground truth.
To use Rating evaluation, you need to complete some prerequisites based on what you want to evaluate:
To evaluate pre-generated completion results:
Pre-generated completion supports two different CSV formats for organizing and processing data:
Two column CSV format: prompt
and completion
.
Four column CSV format: prompt_template
, prompt
, sources
, completion
.
To evaluate LLM applications:
Ensure the LLM application is deployed.
Prepare a dataset in a CSV file with one column: prompt
.
To begin using the Rating evaluation:
Navigate to the Evaluation page under LLM Labs menu.
Click the Create evaluation project button and choose Rating project type.
Set up your project. Choose what you want to evaluate with Rating:
Evaluate pre-generated completion results
Upload the dataset in a CSV file with two columns: prompt
and completion
.
Evaluate LLM applications
Upload the dataset in a CSV file with one column: prompt
.
Select the LLM application that you want to use to generate the completions. If you can’t find your application in the list, go to the playground where your application is created, and deploy it. You can only evaluate deployed LLM application.
Click the Create evaluation project button.
In the LLM Rating Evaluation, we support two user roles:
Labeler: As a labeler, you will need to evaluate the completion by giving it rating and expected completion. The labeler can be a subject-matter expert that will evaluate your LLM application completions.
Reviewer: As a reviewer, you will need to review the labelers’ work.
As a labeler, you should rate each completion of a prompt from 1 to 5 stars. A 5-star rating usually means the completion is already perfect, so there is no need to provide feedback or edit the completion.
When the rating is below 5 stars, you have to refine the completion by providing your expected completion. After that, submit the answer to move to the next prompt.
As a reviewer, you have to review the labelers' answers. When there are conflicts between labelers, you must choose the most accurate one. Alternatively, you can give your own rating and expected
By default, when you create a Rating evaluation project in LLM Labs, the project creator is assigned both Labeler and Reviewer roles. You can update the role and assign new labelers or reviewers by following these steps:
Open your Rating evaluation project.
Switch to Reviewer mode.
Open the project settings from File > Settings.
Navigate to the Assignment menu.
In the Assignment menu, you can change roles and add new members to your project. You can also configure conflict resolution and dynamic review assignment.