Eval Module

Many generative AI applications focus on text generation. It can be challenging to create metrics for insights into expected performance when dealing with unstructured text.

dbnl.eval is a special module designed for evaluating unstructured text. This module currently includes:

  • Adaptive metric sets for generic text and RAG applications

  • 12+ simple statistical local library powered text metrics

  • 15+ LLM-as-judge and embedding powered text metrics

  • Support for user-defined custom LLM-as-judge metrics

  • LLM-as-judge metrics compatible with OpenAI, Azure OpenAI

Building dbnl tests on these evaluation metrics can then drive rich insights into an AI application's stability and performance.

Was this helpful?