LogoLogo
AboutBlogLaunch app ↗
v0.23.x
v0.23.x
  • Get Started
  • Overview
  • Getting Access to Distributional
  • Install the Python SDK
  • Quickstart
  • Learning about Distributional
    • Distributional Concepts
    • Why We Test Data Distributions
    • The Flow of Data
  • Using Distributional
    • Projects
    • Runs
      • Reporting Runs
      • Setting a Baseline Run
    • Metrics
    • Tests
      • Creating Tests
        • Using Filters in Tests
        • Available Statistics and Assertions
      • Running Tests
      • Reviewing Tests
        • What Is a Similarity Index?
    • Notifications
    • Access Controls
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
  • Platform
    • Sandbox
    • Self-hosted
      • Architecture
      • Deployment
        • Helm Chart
        • Terraform Module
      • Networking
      • OIDC Authentication
      • Data Security
  • Reference
    • Query Language
      • Functions
    • Python SDK
      • dbnl
      • dbnl.util
      • dbnl.experimental
      • Classes
      • Eval Module
        • Quick Start
        • dbnl.eval
        • dbnl.eval.metrics
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
      • Classes
  • CLI
  • Versions
    • Release Notes
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • What are Metrics?
  • Using Metrics
  • Convenience Functions

Was this helpful?

Export as PDF
  1. Using Distributional

Metrics

PreviousSetting a Baseline RunNextTests

Was this helpful?

What are Metrics?

Metrics are measurable properties that help quantify specific characteristics of your data. Metrics can be user-defined, by providing a numeric column computed from your source data alongside your application data.

Alternatively, the Distributional SDK offers a comprehensive set of metrics for evaluating various aspects of text and LLM outputs. Using Distributional's methods for computing metrics will enable better data-exploration and application stability monitoring capabilities.

The contains more details on Metrics including some example usage.

Using Metrics

The SDK provides convenient functions for computing metrics from your data and reporting the results to Distributional:

import dbnl
import dbnl.eval
import pandas as pd

# login to dbnl
dbnl.login()
project = dbnl.create_project(name="Metrics Project")

df = pd.DataFrame(
    {
        "id": [1, 2, 3],
        "question": [
            "What is the meaning of life?",
            "What is the airspeed velocity of an unladen swallow?",
            "What is the capital of Assyria?",
        ],
        "answer": [
            "To be happy and fulfilled.",
            "It's a question of aerodynamics.",
            "Nineveh was the capital of Assyria.",
        ],
        "expected_answer": [
            "42",
            "It's a question of aerodynamics.",
            "Nineveh was the capital of Assyria.",
        ],
    }
)

# Create individual metrics
metrics = [
    dbnl.eval.metrics.token_count("question"),
    dbnl.eval.metrics.word_count("question"),
    dbnl.eval.metrics.rouge1("answer", "expected_answer"),
]

# Compute metrics and report results to Distributional
run = dbnl.eval.report_run_with_results(
    project=project, column_data=df, metrics=metrics
)

Convenience Functions

The SDK includes helper functions for creating common groups of related metrics based on consistent inputs.

import dbnl
import dbnl.eval
import pandas as pd

# login to DBNL
dbnl.login()
project = dbnl.create_project(name="Metrics Project")

df = pd.DataFrame(
    {
        "id": [1, 2, 3],
        "question": [
            "What is the meaning of life?",
            "What is the airspeed velocity of an unladen swallow?",
            "What is the capital of Assyria?",
        ],
        "answer": [
            "To be happy and fulfilled.",
            "It's a question of aerodynamics.",
            "Nineveh was the capital of Assyria.",
        ],
        "expected_answer": [
            "42",
            "It's a question of aerodynamics.",
            "Nineveh was the capital of Assyria.",
        ],
    }
)

# Get standard text evaluation metrics
text_eval_metrics = dbnl.eval.metrics.text_metrics(
    prediction="answer", target="expected_answer"
)

# Get comprehensive QA evaluation metrics
qa_metrics = dbnl.eval.metrics.question_and_answer_metrics(
    prediction="answer",
    target="expected_answer",
    input="question",
)

# Compute metrics and report results to Distributional
run = dbnl.eval.report_run_with_results(
    project=project, column_data=df, metrics=(text_eval_metrics + qa_metrics)
)

See the for a more complete list and description of available metrics.

See the for a more complete list and description of available functions.

SDK documentation
SDK documentation
SDK documentation
Summary statistics for the Run uploaded above
Summary statistics for the Run uploaded above