Application Metric Sets

The metric set helpers return an adaptive list of metrics, relevant to the application type. See the dbnl.eval.metrics reference for details on all the metric functions available in the eval SDK.

`text_metrics()`

Basic metrics for generic text comparison and monitoring

token_count
word_count
flesch_kincaid_grade
automated_readability_index
bleu
levenshtein
rouge1
rouge2
rougeL
rougeLsum
llm_text_toxicity_v0
llm_sentiment_assessment_v0
llm_reading_complexity_v0
llm_grammar_accuracy_v0
inner_product
llm_text_similarity_v0

`question_and_answer_metrics()`

Basic metrics for RAG / question answering

llm_accuracy_v0
llm_completeness_v0
answer_similarity_v0
faithfulness_v0
mrr
context_hit

The metric set helpers are adaptive in that :

The metrics returned encode which columns of the dataframe are input to the metric computation e.g., rougeL_prediction__ground_truth is the rougeL metric run with both the column named prediction and the column named ground_truth as input
The metrics returned support any additional optional column info and LLM-as-judge or embedding model clients. If any of this optional info is not provided, the metric set will exclude any metrics that depend on that information

def text_metrics(
    prediction: str,
    target: Optional[str] = None,
    eval_llm_client: Optional[LLMClient] = None,
    eval_embedding_client: Optional[EmbeddingClient] = None,
) -> list[Metric]:
    """
    Returns a set of metrics relevant for a generic text application

    :param prediction: prediction column name (i.e. generated text)
    :param target: target column name (i.e. expected text)
    :return: list of metrics
    """

See the How-To section for concrete examples of adaptive text_metrics() usage

See the RAG example for question_and_answer_metrics() usage

PreviousQuick Start NextHow-To / FAQ

Was this helpful?