Eval Module

Many generative AI applications focus on text generation. It can be challenging to create metrics for insights into expected performance when dealing with unstructured text.

dbnl.eval is a special module designed for evaluating unstructured text. This module currently includes:

Adaptive metric sets for generic text and RAG applications
12+ simple statistical local library powered text metrics
15+ LLM-as-judge and embedding powered text metrics
Support for user-defined custom LLM-as-judge metrics
LLM-as-judge metrics compatible with OpenAI, Azure OpenAI

Building DBNL tests on these evaluation metrics can then drive rich insights into an AI application's stability and performance.

PreviousPython SDK NextQuick Start

Was this helpful?