LogoLogo
AboutBlogLaunch app ↗
v0.23.x
v0.23.x
  • Get Started
  • Overview
  • Getting Access to Distributional
  • Install the Python SDK
  • Quickstart
  • Learning about Distributional
    • Distributional Concepts
    • Why We Test Data Distributions
    • The Flow of Data
  • Using Distributional
    • Projects
    • Runs
      • Reporting Runs
      • Setting a Baseline Run
    • Metrics
    • Tests
      • Creating Tests
        • Using Filters in Tests
        • Available Statistics and Assertions
      • Running Tests
      • Reviewing Tests
        • What Is a Similarity Index?
    • Notifications
    • Access Controls
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
  • Platform
    • Sandbox
    • Self-hosted
      • Architecture
      • Deployment
        • Helm Chart
        • Terraform Module
      • Networking
      • OIDC Authentication
      • Data Security
  • Reference
    • Query Language
      • Functions
    • Python SDK
      • dbnl
      • dbnl.util
      • dbnl.experimental
      • Classes
      • Eval Module
        • Quick Start
        • dbnl.eval
        • dbnl.eval.metrics
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
      • Classes
  • CLI
  • Versions
    • Release Notes
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • text_metrics()
  • question_and_answer_metrics()

Was this helpful?

Export as PDF
  1. Reference
  2. Python SDK
  3. Eval Module

Application Metric Sets

Previousdbnl.eval.metricsNextHow-To / FAQ

Last updated 1 month ago

Was this helpful?

The metric set helpers return an adaptive list of metrics, relevant to the application type. See the for details on all the metric functions available in the eval SDK.

text_metrics()

Basic metrics for generic text comparison and monitoring

  • token_count

  • word_count

  • flesch_kincaid_grade

  • automated_readability_index

  • bleu

  • levenshtein

  • rouge1

  • rouge2

  • rougeL

  • rougeLsum

  • llm_text_toxicity_v0

  • llm_sentiment_assessment_v0

  • llm_reading_complexity_v0

  • llm_grammar_accuracy_v0

  • inner_product

  • llm_text_similarity_v0

question_and_answer_metrics()

Basic metrics for RAG / question answering

  • llm_accuracy_v0

  • llm_completeness_v0

  • answer_similarity_v0

  • faithfulness_v0

  • mrr

  • context_hit

The metric set helpers are adaptive in that :

  1. The metrics returned encode which columns of the dataframe are input to the metric computation e.g., rougeL_prediction__ground_truth is the rougeL metric run with both the column named prediction and the column named ground_truth as input

  2. The metrics returned support any additional optional column info and LLM-as-judge or embedding model clients. If any of this optional info is not provided, the metric set will exclude any metrics that depend on that information

def text_metrics(
    prediction: str,
    target: Optional[str] = None,
    eval_llm_client: Optional[LLMClient] = None,
    eval_embedding_client: Optional[EmbeddingClient] = None,
) -> list[Metric]:
    """
    Returns a set of metrics relevant for a generic text application

    :param prediction: prediction column name (i.e. generated text)
    :param target: target column name (i.e. expected text)
    :return: list of metrics
    """

See the for concrete examples of adaptive text_metrics() usage

See the for question_and_answer_metrics() usage

dbnl.eval.metrics reference
How-To section
RAG example