Multi-application evaluation

Overview

This feature allows you to compare and evaluate the performance of multiple models using various metrics from renowned evaluators like Ragas, Langchain, and Deepeval. By streamlining the assessment process, you can gain insights into the strengths and weaknesses of different applications and make data-driven decisions to optimize your workflows.

Get started

To evaluate multiple models:

Navigate to the Evaluation page under LLM Labs menu.
Click the Create evaluation project button and choose Automated evaluation project type, then Continue.
Configure your evaluation by selecting the models to evaluate and choosing a dataset from the library. If you don’t have one, you can also upload a dataset in a CSV format containing two columns: prompt and expected completion.

If you can’t find your model in the list, go to the Sandbox where your model is created, and deploy or save to library. You can only evaluate deployed or saved models.

Select the metric, provider, and the evaluator model you want to use for evaluation. Learn more about the evaluators and metrics.
Click Create evaluation project and wait for the evaluation process to finish.