Multi-application evaluation
Last updated
Last updated
This feature allows you to compare and evaluate the performance of multiple LLM applications using various metrics from renowned evaluators like Ragas, Langchain, and Deepeval. By streamlining the assessment process, you can gain insights into the strengths and weaknesses of different applications and make data-driven decisions to optimize your workflows.
To begin using the Automated Evaluation multi-application:
Navigate to the Evaluation page under LLM Labs menu.
Click the Create evaluation project button and choose Automated evaluation project type.
Configure your evaluation by selecting the applications that you want to evaluate, and uploading the ground truth dataset in a CSV format containing two columns: prompt and expected completion.
If you can’t find your application in the list, go to the sandbox where your application is created, and deploy it. You can also use the application that you already saved in your application library.
Select the Metric, Provider, and the Evaluator model you want to use for evaluation. Learn more about the evaluators and metrics.
Click the Create evaluation project button and wait for your evaluation process to be done.
After the evaluation process is completed, you can analyze the results.
Here you can view the total cost, time taken for generating completions, and the overall performance score given by the evaluator.
Here you can view the quality, the score, and the processing time of the generated completion from each application.
You can view the evaluation details from each application by clicking the three-dots icon on the right side of the table.
Here you can view the detailed evaluation result from each application.