LogoLogo
AboutBlogLaunch app ↗
v0.23.x
v0.23.x
  • Get Started
  • Overview
  • Getting Access to Distributional
  • Install the Python SDK
  • Quickstart
  • Learning about Distributional
    • Distributional Concepts
    • Why We Test Data Distributions
    • The Flow of Data
  • Using Distributional
    • Projects
    • Runs
      • Reporting Runs
      • Setting a Baseline Run
    • Metrics
    • Tests
      • Creating Tests
        • Using Filters in Tests
        • Available Statistics and Assertions
      • Running Tests
      • Reviewing Tests
        • What Is a Similarity Index?
    • Notifications
    • Access Controls
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
  • Platform
    • Sandbox
    • Self-hosted
      • Architecture
      • Deployment
        • Helm Chart
        • Terraform Module
      • Networking
      • OIDC Authentication
      • Data Security
  • Reference
    • Query Language
      • Functions
    • Python SDK
      • dbnl
      • dbnl.util
      • dbnl.experimental
      • Classes
      • Eval Module
        • Quick Start
        • dbnl.eval
        • dbnl.eval.metrics
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
      • Classes
  • CLI
  • Versions
    • Release Notes
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • Designing a Test
  • Context-Driven Test Creation
  • Key Insights
  • Column or Metric Details
  • Summary Statistics Table
  • And More!
  • Templated Tests
  • Creating Tests Manually (Advanced)

Was this helpful?

Export as PDF
  1. Using Distributional
  2. Tests

Creating Tests

PreviousTestsNextUsing Filters in Tests

Last updated 29 days ago

Was this helpful?

As you become more familiar with the behavior of your application, you may want to build on the default App Similarity Index test with tests that you define yourself. Let's walk through that process.

Designing a Test

The first step in coming up with a test is determining what behavior you're interested in. As described in , each Run of your application reports its behavior via its results, which are organized into columns (and scalars). Once you've identified the column or scalar you'd like to test on, then you need to determine what you'd like to apply to it and the you'd like to make on that statistic.

This might seem like a lot, but DBNL has your back! While you can define tests , DBNL has several ways of helping you identify what columns you might be interested in and letting you quickly define tests on them.

When creating a test, you can specify tags to apply to it. You can use these tags to filter which tests you want to include or exclude later when . Some of the test creation shortcuts on the UI do not currently allow specifying tests, but you can edit the test and add tags after the fact.

Context-Driven Test Creation

As you browse the DBNL UI, you will see "+" icons or "+ Add Test" buttons appear. These provide context-aware shortcuts for easily creating relevant tests.

At each of these locations, a test creation drawer will open on the right side of the page with several of the fields pre-populated based on the context of the button, alongside a history of the statistic, if relevant. Here are some of the best places to look for DBNL-assisted test creation:

Key Insights

When you're looking at a Test Session, DBNL will provide insights about which columns or metrics have have demonstrated the most drift. These are great candidates to define tests on if you want to be specificially alerted about their behavior. You can click the "Add Test" button to create a test on the of the relevant column. The Similarity Index history graph can help guide you on choosing a threshold.

Column or Metric Details

When inspecting the details of a column or metric from a Test Session, there are several "Add Test" buttons provided to allow you to quickly create a test on a relevant statistic. The Statistic History graph can help guide you on choosing a threshold.

Summary Statistics Table

When viewing a Run, each entry in the summary statistics table can be used to seed creation of a test for that chosen statistic.

And More!

These shortcuts appear in several other places in the UI as well when you are inspecting your Runs and Test Sessions; keep an eye out for the "+"!

Templated Tests

Test templates are macros for basic test patterns recommended by Distributional. It allows the user to quickly create tests from a builder in the UI. Distributional provides five classes of test templates:

From the Test Configuration tab on your Project, click the dropdown next to "Add Test".

Select from one of the five options. A Test Creation drawer will appear and the user can edit the statistic, column, and assertion that they desire. Note that each Test Template has a limited set of statistics that it supports.

Creating Tests Manually (Advanced)

If you have a good idea of what you want to test or just want to explore, you can create tests manually from either the UI or via the Python SDK.

Let's say you are building an Q&A chatbot, and you have a column for the length of your bot's responses, word_count. Perhaps you want to ensure that your bot never outputs more than 100 words; in that case, you'd choose:

  • The statistic max,

  • The assertion less than or equal to ,

  • and the threshold 100.

But what if you're not opinionated about the specific length? You just want to ensure that your app is behaving consistently as it runs and doesn't suddenly start being unusually wordy or terse. DBNL makes it easy to test that as well; you might go with:

  • The statistic absolute difference of mean,

  • The assertion less than,

  • and the threshold 20.

Now you're ready to go and create that test, either via the UI or the SDK:

From your Project, click the "Test Configuration" tab.

Next to the "My Tests" header, you can click "Add Test" to open the test creation page, which will enable you to define your test through the dropdown menu on the left side of the window.

import dbnl
dbnl.login()

proj = dbnl.get_or_create_project(name="My Project")

dbnl.experimental.create_test(
    test_spec_dict={
        "project_id": proj.id,
        "name": "Word count difference",
        "description": "Test the absolute difference of mean on word_count",
        "statistic_name": "abs_diff_mean",
        "statistic_params": {},
        "assertion": {
            "name": "less_than",
            "params": {
                "other": 20.0,
            },
        },
        "statistic_inputs": [
            {
                "select_query_template": {
                    "select": "{EXPERIMENT}.word_count"
                }
            },
            {
                "select_query_template": {
                    "select": "{BASELINE}.word_count"
                }
            },
        ],
    }
)

: These are parametric statistics of a column.

: These test if the absolute difference of a statistic of a column between two runs is less than a threshold.

: These test if the column from two different runs are similarly distributed is using a nonparametric statistic.

: These are tests on the row-wise absolute difference of result

: These tests the signed difference of a statistic of a column between two runs

When creating a test manually, you can also specify filters to apply the test only to specific rows within your Runs. Check out for more information.

On the left side you can configure your test by choosing a statistic and assertion. Note that you can use our builder or build a test spec with raw JSON (you can see some example test spec JSONs ). On the right, you can browse the data of recent Runs to help you figure out what statistics and thresholds are appropriate to define acceptable behavior.

Tests can be using the python SDK. Users must provide a JSON dictionary that adheres to the dbnl Test Spec, which is described in the previous link and has an example provided below.

You can see a full list with descriptions of available statistics and assertions .

Single Run
Similarity of Statistics
Similarity of Distributions
Similarity of Results
Difference of Statistics
Using Filters in Tests
here
the section on Runs
manually
creating a Test Session
programmatically created
Similarity Index
statistic
assertion
Test Session Summary Page
Column or Metric Similarity Page
Run Details Page
Test Creation Drawer
The Create Test page
here