Rating

Overview

The Rating evaluation feature provides a streamlined way to assess the quality of your Large Language Model (LLM) outputs. By leveraging human judgment, you can gain valuable insights into your model's strengths and weaknesses, ultimately guiding its improvement.

Rating evaluation focuses on evaluating individual LLM outputs (completions) against predefined criteria. This involves assigning a score to each completion, reflecting its quality in relation to a specific prompt and ground truth.

Prerequisites

To use Rating evaluation, you need to complete some prerequisites based on what you want to evaluate:

To evaluate pre-generated completion results:

Pre-generated completion supports two different CSV formats for organizing and processing data:

Two column CSV format: prompt and completion.
Four column CSV format: prompt_template, prompt, sources, completion.

2KB

Rating - with source.csv

1KB

LLM Evaluation - Credit card.csv

To evaluate LLM applications:

Ensure the LLM application is deployed.
Prepare a dataset in a CSV file with one column: prompt.

365B

Evaluation with LLM Application.csv

Getting started

To begin using the Rating evaluation:

Navigate to the Evaluation page under LLM Labs menu.
Click the Create evaluation project button and choose Rating project type.

Set up your project. Choose what you want to evaluate with Rating:
1. Evaluate pre-generated completion results
  1. Upload the dataset in a CSV file with two columns: prompt and completion.
  Rating evaluation project with pre-generated completion creation
2. Evaluate LLM applications
  1. Upload the dataset in a CSV file with one column: prompt.
  2. Select the LLM application that you want to use to generate the completions. If you can’t find your application in the list, go to the sandbox where your application is created, and deploy it. You can only evaluate deployed LLM application.
  Rating evaluation project with LLM application creation
Click the Create evaluation project button.

Evaluate the completions

In the LLM Rating Evaluation, we support two user roles:

Labeler: As a labeler, you will need to evaluate the completion by giving it rating and expected completion. The labeler can be a subject-matter expert that will evaluate your LLM application completions.
Reviewer: As a reviewer, you will need to review the labelers’ work.

Labeler

As a labeler, you should rate each completion of a prompt from 1 to 5 stars. A 5-star rating usually means the completion is already perfect, so there is no need to provide feedback or edit the completion.

When the rating is below 5 stars, you have to refine the completion by providing your expected completion. After that, submit the answer to move to the next prompt.

Reviewer

As a reviewer, you have to review the labelers' answers. When there are conflicts between labelers, you must choose the most accurate one. Alternatively, you can give your own rating and expected

Assignments

By default, when you create a Rating evaluation project in LLM Labs, the project creator is assigned both Labeler and Reviewer roles. You can update the role and assign new labelers or reviewers by following these steps:

Open your Rating evaluation project.
Switch to Reviewer mode.

Open the project settings from File > Settings.
Navigate to the Assignment menu.
In the Assignment menu, you can change roles and add new members to your project. You can also configure conflict resolution and dynamic review assignment.

Last updated 11 days ago