Application Metric Sets

The metric set helpers return an adaptive list of metrics, relevant to the application type

text_metrics()

Basic metrics for generic text comparison and monitoring

question_and_answer_metrics()

Basic metrics for RAG / question answering

The metric set helpers are adaptive in that :

  1. The metrics returned encode which columns of the dataframe are input to the metric computation e.g., rougeL_prediction__ground_truth is the rougeL metric run with both the column named prediction and the column named ground_truth as input

  2. The metrics returned support any additional optional column info and LLM-as-judge or embedding model clients. If any of this optional info is not provided, the metric set will exclude any metrics that depend on that information

def text_metrics(
    prediction: str,
    target: Optional[str] = None,
    eval_llm_client: Optional[LLMClient] = None,
    eval_embedding_client: Optional[EmbeddingClient] = None,
) -> list[Metric]:
    """
    Returns a set of metrics relevant for a generic text application

    :param prediction: prediction column name (i.e. generated text)
    :param target: target column name (i.e. expected text)
    :return: list of metrics
    """

See the How-To section for concrete examples of adaptive text_metrics() usage

See the RAG example for question_and_answer_metrics() usage

Was this helpful?