

In Datasaur, we support three types of evaluation. You can choose the one that best suits your needs.


Automated evaluation

You can evaluate an LLM application or pre-generated completions using your preferred metrics by comparing them to a ground truth. Currently, we support Answer correctness metric from LangChain and Ragas. More metrics will be available soon!


Evaluate your model by ranking several completion results from each prompt from best to worst. You can evaluate pre-generated completions or an LLM application and determine how many results it needs to generate.


Evaluate each completion result from each prompt by rating them with 1 to 5 stars and providing your expected completions. You can evaluate pre-generated completions or an LLM application.

Last updated