LogoLogo
AboutBlogLaunch app ↗
v0.20.x
v0.20.x
  • Introduction to AI Testing
  • Welcome to Distributional
  • Motivation
  • What is AI Testing?
  • Stages in the AI Software Development Lifecycle
    • Components of AI Testing
  • Distributional Testing
  • Getting Access to Distributional
  • Learning about Distributional
    • The Distributional Framework
    • Defining Tests in Distributional
      • Automated Production test creation & execution
      • Knowledge-based test creation
      • Comprehensive testing with Distributional
    • Reviewing Test Sessions and Runs in Distributional
      • Reviewing and recalibrating automated Production tests
      • Insights surfaced elsewhere on Distributional
      • Notifications
    • Data in Distributional
      • The flow of data
      • Components and the DAG for root cause analysis
      • Uploading data to Distributional
      • Living in your VPC
  • Using Distributional
    • Getting Started
    • Access
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
    • Data
      • Data Objects
      • Run-Level Data
      • Data Storage Integrations
      • Data Access Controls
    • Testing
      • Creating Tests
        • Test Page
        • Test Drawer Through Shortcuts
        • Test Templates
        • SDK
      • Defining Assertions
      • Production Testing
        • Auto-Test Generation
        • Recalibration
        • Notable Results
        • Dynamic Baseline
      • Testing Strategies
        • Test That a Given Distribution Has Certain Properties
        • Test That Distributions Have the Same Statistics
        • Test That Columns Are Similarly Distributed
        • Test That Specific Results Have Matching Behavior
        • Test That Distributions Are Not the Same
      • Executing Tests
        • Manually Running Tests Via UI
        • Executing Tests Via SDK
      • Reviewing Tests
      • Using Filters
        • Filters in the Compare Page
        • Filters in Tests
    • Python SDK
      • Quick Start
      • Functions
        • login
        • Project
          • create_project
          • copy_project
          • export_project_as_json
          • get_project
          • get_or_create_project
          • import_project_from_json
        • Run Config
          • create_run_config
          • get_latest_run_config
          • get_run_config
          • get_run_config_from_latest_run
        • Run Results
          • get_column_results
          • get_scalar_results
          • get_results
          • report_column_results
          • report_scalar_results
          • report_results
        • Run
          • close_run
          • create_run
          • get_run
          • report_run_with_results
        • Baseline
          • create_run_query
          • get_run_query
          • set_run_as_baseline
          • set_run_query_as_baseline
        • Test Session
          • create_test_session
      • Objects
        • Project
        • RunConfig
        • Run
        • RunQuery
        • TestSession
        • TestRecalibrationSession
        • TestGenerationSession
        • ResultData
      • Experimental Functions
        • create_test
        • get_tests
        • get_test_sessions
        • wait_for_test_session
        • get_or_create_tag
        • prepare_incomplete_test_spec_payload
        • create_test_recalibration_session
        • wait_for_test_recalibration_session
        • create_test_generation_session
        • wait_for_test_generation_session
      • Eval Module
        • Quick Start
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
        • Eval Module Functions
          • Index of functions
          • eval
          • eval.metrics
    • Notifications
    • Release Notes
  • Tutorials
    • Instructions
    • Hello World (Sentiment Classifier)
    • Trading Strategy
    • LLM Text Summarization
      • Setting the Scene
      • Prompt Engineering
      • Integration testing for text summarization
      • Practical considerations
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • text_metrics()
  • question_and_answer_metrics()

Was this helpful?

Export as PDF
  1. Using Distributional
  2. Python SDK
  3. Eval Module

Application Metric Sets

PreviousQuick StartNextHow-To / FAQ

Was this helpful?

The metric set helpers return an adaptive list of metrics, relevant to the application type

text_metrics()

Basic metrics for generic text comparison and monitoring

question_and_answer_metrics()

Basic metrics for RAG / question answering

The metric set helpers are adaptive in that :

  1. The metrics returned encode which columns of the dataframe are input to the metric computation e.g., rougeL_prediction__ground_truth is the rougeL metric run with both the column named prediction and the column named ground_truth as input

  2. The metrics returned support any additional optional column info and LLM-as-judge or embedding model clients. If any of this optional info is not provided, the metric set will exclude any metrics that depend on that information

def text_metrics(
    prediction: str,
    target: Optional[str] = None,
    eval_llm_client: Optional[LLMClient] = None,
    eval_embedding_client: Optional[EmbeddingClient] = None,
) -> list[Metric]:
    """
    Returns a set of metrics relevant for a generic text application

    :param prediction: prediction column name (i.e. generated text)
    :param target: target column name (i.e. expected text)
    :return: list of metrics
    """

See the for concrete examples of adaptive usage

See the for usage

How-To section
RAG example
token_count
word_count
flesch_kincaid_grade
automated_readability_index
bleu
levenshtein
rouge1
rouge2
rougeL
rougeLsum
llm_text_toxicity_v0
llm_sentiment_assessment_v0
llm_reading_complexity_v0
llm_grammar_accuracy_v0
inner_product
llm_text_similarity_v0
llm_accuracy_v0
llm_completeness_v0
answer_similarity_v0
faithfulness_v0
mrr
context_hit
text_metrics()
question_and_answer_metrics()