Eval Module
Many generative AI applications focus on text generation. It can be challenging to create metrics for insights into expected performance when dealing with unstructured text.
dbnl.eval
is a special module designed for evaluating unstructured text. This module currently includes:
Adaptive metric sets for generic text and RAG applications
12+ simple statistical local library powered text metrics
15+ LLM-as-judge and embedding powered text metrics
Support for user-defined custom LLM-as-judge metrics
LLM-as-judge metrics compatible with OpenAI, Azure OpenAI
Building dbnl tests on these evaluation metrics can then drive rich insights into an AI application's stability and performance.
Was this helpful?