Eval Module

Many generative AI applications focus on text generation. It can be challenging to create metrics for insights into expected performance when dealing with unstructured text.

dbnl.eval is a special module designed for evaluating unstructured text. This module currently includes:

  • Adaptive metric sets for generic text and RAG applications

  • 12+ simple statistical local library powered text metrics

  • 15+ LLM-as-judge and embedding powered text metrics

  • Support for user-defined custom LLM-as-judge metrics

  • LLM-as-judge metrics compatible with OpenAI, Azure OpenAI

Building dbnl tests on these evaluation metrics can then drive rich insights into an AI application's stability and performance.

Last updated

Was this helpful?