Ranking (RLHF)
Last updated
Last updated
The LLM Evaluation Ranking feature in Datasaur's LLM Labs allows users to generate datasets for Reinforcement Learning from Human Feedback (RLHF). This feature is part of the LLM Labs Evaluation Module and provides a seamless way to generate and rank completions based on previously saved Ground Truth prompts.
To use LLM Ranking Evaluation, you need to complete some prerequisites based on what you want to evaluate:
To evaluate pre-generated completion results:
Prepare a dataset in a CSV file with several columns: prompt
and completion_1
, completion_2
, completion_3
, and so forth up to completion_xx
.
To evaluate LLM applications:
Ensure the LLM application is deployed.
Prepare a dataset in a CSV file with one column: prompt
.
To begin using the Ranking evaluation:
Navigate to the Evaluation page under LLM Labs menu.
Click the Create evaluation project button and choose Rating project type.
Set up your project. Choose what you want to evaluate with Ranking:
Evaluate pre-generated completions
Upload the dataset in a CSV file with several columns: prompt
and completion_1
, completion_2
, completion_3
, and so forth up to completion_xx
.
Evaluate LLM applications
Upload the dataset in a CSV file with one column: prompt
.
Select the LLM application that you want to use to generate the completions. If you can’t find your application in the list, go to the playground where your application is created, and deploy it. You can only evaluate deployed LLM application.
Click the Create evaluation project button.
In Ranking evaluation project, we support two user roles:
Labeler: As a labeler, you will need to ranking several completions for each prompt from best to worst. The labeler can be a subject-matter expert that will evaluate your LLM application completions.
Reviewer: As a reviewer, you will need to review the labelers’ work.
Each prompt comes with a minimum of two completions. As a labeler, you have to rank the completion from best to worst, by dragging the completion. After that, submit the answer to move to the next prompt.
As a reviewer, you have to review the labelers' answers. When there are conflicts between labelers, you must choose the most accurate one. Alternatively, you can give your own ranking.
By default, when you create a Ranking evaluation in LLM Labs, the project creator is assigned both Labeler and Reviewer roles. You can update the Ranking evaluation roles by following these steps:
Open your Ranking evaluation project.
Switch to Reviewer mode.
Open the project settings from File > Settings.
Navigate to the Assignment menu.
In the Assignment section, you can change roles and add new members to your project. You can also configure conflict resolution and dynamic review assignment.