Eval Module

Many generative AI applications focus on text generation. It can be challenging to create metrics for insights into expected performance when dealing with unstructured text.

dbnl.eval is a special module designed for evaluating unstructured text. This module currently includes:

Adaptive metric sets for generic text and RAG applications
12+ simple statistical local library powered text metrics
15+ LLM-as-judge and embedding powered text metrics
Support for user-defined custom LLM-as-judge metrics
LLM-as-judge metrics compatible with OpenAI, Azure OpenAI

Building dbnl tests on these evaluation metrics can then drive rich insights into an AI application's stability and performance.

Previouswait_for_test_generation_session NextQuick Start

Was this helpful?