LogoLogo
AboutBlogLaunch app ↗
v0.22.x
v0.22.x
  • Get Started
  • Overview
  • Getting Access to Distributional
  • Install the Python SDK
  • Quickstart
  • Learning about Distributional
    • Distributional Concepts
    • Why We Test Data Distributions
    • The Flow of Data
  • Using Distributional
    • Projects
    • Runs
      • Reporting Runs
      • Setting a Baseline Run
    • Metrics
    • Tests
      • Creating Tests
        • Using Filters in Tests
        • Available Statistics and Assertions
      • Running Tests
      • Reviewing Tests
        • What Is a Similarity Index?
    • Notifications
    • Access Controls
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
  • Platform
    • Sandbox
    • Self-hosted
      • Architecture
      • Deployment
        • Helm Chart
        • Terraform Module
      • Networking
      • OIDC Authentication
      • Data Security
  • Reference
    • Query Language
      • Functions
    • Python SDK
      • dbnl
      • dbnl.util
      • dbnl.experimental
      • Classes
      • Eval Module
        • Quick Start
        • dbnl.eval
        • dbnl.eval.metrics
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
    • CLI
  • Versions
    • Release Notes
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page

Was this helpful?

Export as PDF
  1. Using Distributional

Tests

PreviousMetricsNextCreating Tests

Was this helpful?

Tests are the key tool within dbnl for asserting the behavior and consistency of Runs. Possible goals during testing can include:

  • Asserting that your application, holistically or for a chosen column, behaves consistently compared to a baseline.

  • Asserting that a chosen column meets its minimum desired behavior (e.g., inference throughput);

  • Asserting that a chosen column has a distribution that roughly matches a baseline reference;

By default, your Project will be pre-populated with a test for the first goal above. This is the "App Similarity Index" test which gives you a quick understanding of whether your application's behavior has significantly deviated from a selected baseline.

What's in a Test?

At a high level, a Test is a statistic and an assertion. Generally, the statistic aggregates the data in a column or columns, and the assertion tests some truth about that aggregation. This assertion may check the values from a single Run, or it may check how the values in a Run have changed compared to a baseline. Some basic examples:

  1. Assert the 95th percentile of app_latency_ms is less than or equal to 180

Test Spec JSON
{
    "name": "p95_app_latency_ms",
    "description": "Test the 95th percentile of latency in miliseconds",
    "statistic_name": "percentile",
    "statistic_params": {"percentage": 0.95},
    "assertion": {
        "name": "less_than_or_equal_to",
        "params": {
            "other": 180.0,
        },
    },
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "{EXPERIMENT}.app_latency_ms"
            }
        },
    ],
}
  1. Assert the absolute difference of median of positive_sentiment_score against the baseline is close to 0

Test Spec JSON
{
    "name": "median_sentiment_similar",
    "description": "Test the absolute difference of median on sentiment",
    "statistic_name": "abs_diff_median",
    "statistic_params": {},
    "assertion": {
        "name": "close_to",
        "params": {
            "other": 0.0,
            "tolerance": 0.01,
        },
    },
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "{EXPERIMENT}.positive_sentiment_score"
            }
        },
        {
            "select_query_template": {
                "select": "{BASELINE}.positive_sentiment_score"
            }
        },
    ],
}

In the next sections, we will explore the objects required for testing alongside the methods for creating tests, running tests, reviewing/analyzing tests, and some best practices.