Evaluation

Overview

In Datasaur, we support three types of evaluation. You can choose the one that best suits your needs.

Evaluations

Automated evaluation

You can evaluate an LLM application or pre-generated completions using your preferred metrics by comparing them to a ground truth. Currently, we support Answer correctness metric from LangChain and Ragas. More metrics will be available soon!

Ranking

Evaluate your model by ranking several completion results from each prompt from best to worst. You can evaluate pre-generated completions or an LLM application and determine how many results it needs to generate.

Rating

Evaluate each completion result from each prompt by rating them with 1 to 5 stars and providing your expected completions. You can evaluate pre-generated completions or an LLM application.

Last updated