Ranking (RLHF)

Overview

The Ranking evaluation project helps you assess the quality of your LLM completions using human judgment, by comparing multiple completions for the same prompt. You rank the completions from best to worst, providing insight into which outputs align most closely with your expectations.

Prerequisites

In Ranking projects, you can evaluate two types of completions:

  1. Pre-generated completions

  2. Completions generated by models from Sandbox

Evaluate pre-generated completions

  1. Prepare a dataset in a CSV file with several columns: prompt and completion_1, completion_2, completion_3, and so forth up to completion_xx.

Evaluate models from Sandbox

  1. Ensure the LLM application is deployed.

  2. Prepare a dataset in a CSV file with one column: prompt.

Create project

To create Ranking evaluation projects:

  1. Navigate to the Evaluation page under LLM Labs menu.

  2. Click Create evaluation project, select Ranking, then Continue.

  3. Set up your project. Choose what you want to evaluate with Ranking:

    1. Evaluate pre-generated completions

      1. Upload the dataset in a CSV file with several columns: prompt and completion_1, completion_2, completion_3, and so forth up to completion_xx.

      Ranking evaluation project with pre-generated completion creation
    2. Evaluate models from Sandbox

      1. Upload the dataset in a CSV file with one column: prompt.

      2. Select the model that you want to use to generate completions. If you can’t find your model in the list, go to the Sandbox where your model is created, and deploy or save to library. You can only evaluate deployed or saved models.

      Ranking evaluation project with LLM application creation
  4. Click Create evaluation project.

Evaluate completions

Open the project to start evaluating completions. Each prompt includes at least two completions. Rank them from best to worst by dragging to reorder, then submit your answer to move on to the next prompt.

View evaluation results

After evaluating all completions, mark the evaluation as complete from the app bar. Click the current status Evaluation in progress and change it to Evaluation completed.

After the evaluation is marked as complete, you can view the summary of the evaluation. For models from Sandbox evaluation, you can see:

  • Average cost and processing time for generating completions

  • Evaluation results in a table view

For pre-generated completions evaluation, you can see the evaluation results in a table view as well.

Last updated