Ranking (RLHF)
Overview
The Ranking evaluation project helps you assess the quality of your LLM completions using human judgment, by comparing multiple completions for the same prompt. You rank the completions from best to worst, providing insight into which outputs align most closely with your expectations.
Prerequisites
In Ranking projects, you can evaluate two types of completions:
Pre-generated completions
Completions generated by models from Sandbox
Evaluate pre-generated completions
Prepare a dataset in a CSV file with several columns:
prompt
andcompletion_1
,completion_2
,completion_3
, and so forth up tocompletion_xx
.
Evaluate models from Sandbox
Ensure the LLM application is deployed.
Prepare a dataset in a CSV file with one column:
prompt
.
Create project
To create Ranking evaluation projects:
Navigate to the Evaluation page under LLM Labs menu.
Click Create evaluation project, select Ranking, then Continue.
Set up your project. Choose what you want to evaluate with Ranking:
Evaluate pre-generated completions
Upload the dataset in a CSV file with several columns:
prompt
andcompletion_1
,completion_2
,completion_3
, and so forth up tocompletion_xx
.
Ranking evaluation project with pre-generated completion creation Evaluate models from Sandbox
Upload the dataset in a CSV file with one column:
prompt
.
Ranking evaluation project with LLM application creation
Click Create evaluation project.
Evaluate completions
Open the project to start evaluating completions. Each prompt includes at least two completions. Rank them from best to worst by dragging to reorder, then submit your answer to move on to the next prompt.
View evaluation results
After evaluating all completions, mark the evaluation as complete from the app bar. Click the current status Evaluation in progress and change it to Evaluation completed.

After the evaluation is marked as complete, you can view the summary of the evaluation. For models from Sandbox evaluation, you can see:
Average cost and processing time for generating completions
Evaluation results in a table view

For pre-generated completions evaluation, you can see the evaluation results in a table view as well.

Last updated