Tests

Tests are the key tool within DBNL for asserting the behavior and consistency of Runs. Possible goals during testing can include:

Asserting that your application, holistically or for a chosen column, behaves consistently compared to a baseline.
Asserting that a chosen column meets its minimum desired behavior (e.g., inference throughput);
Asserting that a chosen column has a distribution that roughly matches a baseline reference;

By default, your Project will be pre-populated with a test for the first goal above. This is the "App Similarity Index" test which gives you a quick understanding of whether your application's behavior has significantly deviated from a selected baseline.

What's in a Test?

At a high level, a Test is a statistic and an assertion. Generally, the statistic aggregates the data in a column or columns, and the assertion tests some truth about that aggregation. This assertion may check the values from a single Run, or it may check how the values in a Run have changed compared to a baseline. Some basic examples:

Assert the 95th percentile of app_latency_ms is less than or equal to 180

Test Spec JSON

{
    "name": "p95_app_latency_ms",
    "description": "Test the 95th percentile of latency in miliseconds",
    "statistic_name": "percentile",
    "statistic_params": {"percentage": 0.95},
    "assertion": {
        "name": "less_than_or_equal_to",
        "params": {
            "other": 180.0
        }
    },
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "{EXPERIMENT}.app_latency_ms"
            }
        }
    ]
}

Assert the absolute difference of median of positive_sentiment_score against the baseline is close to 0

Test Spec JSON

{
    "name": "median_sentiment_similar",
    "description": "Test the absolute difference of median on sentiment",
    "statistic_name": "abs_diff_median",
    "statistic_params": {},
    "assertion": {
        "name": "close_to",
        "params": {
            "other": 0.0,
            "tolerance": 0.01
        }
    },
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "{EXPERIMENT}.positive_sentiment_score"
            }
        },
        {
            "select_query_template": {
                "select": "{BASELINE}.positive_sentiment_score"
            }
        }
    ]
}

In the next sections, we will explore the objects required for testing alongside the methods for creating tests, running tests, reviewing/analyzing tests, and some best practices.

PreviousLLM Models NextCreating Tests

Was this helpful?