Application Metric Sets
Was this helpful?
Was this helpful?
The metric set helpers return an adaptive list of metrics, relevant to the application type. See the for details on all the metric functions available in the eval SDK.
text_metrics()
Basic metrics for generic text comparison and monitoring
token_count
word_count
flesch_kincaid_grade
automated_readability_index
bleu
levenshtein
rouge1
rouge2
rougeL
rougeLsum
llm_text_toxicity_v0
llm_sentiment_assessment_v0
llm_reading_complexity_v0
llm_grammar_accuracy_v0
inner_product
llm_text_similarity_v0
question_and_answer_metrics()
Basic metrics for RAG / question answering
llm_accuracy_v0
llm_completeness_v0
answer_similarity_v0
faithfulness_v0
mrr
context_hit
The metric set helpers are adaptive in that :
The metrics returned encode which columns of the dataframe are input to the metric computation
e.g., rougeL_prediction__ground_truth
is the rougeL
metric run with both the column named prediction
and the column named ground_truth
as input
The metrics returned support any additional optional column info and LLM-as-judge or embedding model clients. If any of this optional info is not provided, the metric set will exclude any metrics that depend on that information
See the for concrete examples of adaptive text_metrics()
usage
See the for question_and_answer_metrics()
usage