Application Metric Sets
The metric set helpers return an adaptive list of metrics, relevant to the application type. See the dbnl.eval.metrics reference for details on all the metric functions available in the eval SDK.
text_metrics()
text_metrics()Basic metrics for generic text comparison and monitoring
token_countword_countflesch_kincaid_gradeautomated_readability_indexbleulevenshteinrouge1rouge2rougeLrougeLsumllm_text_toxicity_v0llm_sentiment_assessment_v0llm_reading_complexity_v0llm_grammar_accuracy_v0inner_productllm_text_similarity_v0
question_and_answer_metrics()
question_and_answer_metrics()Basic metrics for RAG / question answering
llm_accuracy_v0llm_completeness_v0answer_similarity_v0faithfulness_v0mrrcontext_hit
The metric set helpers are adaptive in that :
The metrics returned encode which columns of the dataframe are input to the metric computation e.g.,
rougeL_prediction__ground_truthis therougeLmetric run with both the column namedpredictionand the column namedground_truthas inputThe metrics returned support any additional optional column info and LLM-as-judge or embedding model clients. If any of this optional info is not provided, the metric set will exclude any metrics that depend on that information
def text_metrics(
prediction: str,
target: Optional[str] = None,
eval_llm_client: Optional[LLMClient] = None,
eval_embedding_client: Optional[EmbeddingClient] = None,
) -> list[Metric]:
"""
Returns a set of metrics relevant for a generic text application
:param prediction: prediction column name (i.e. generated text)
:param target: target column name (i.e. expected text)
:return: list of metrics
"""See the How-To section for concrete examples of adaptive text_metrics() usage
See the RAG example for question_and_answer_metrics() usage
Last updated
Was this helpful?

