Test That Specific Results Have Matching Behavior

When the results from a run have unique identifiers, one can create a special type of tests for testing matching behavior at a per-result level. One example would be testing the mean of per-result absolute difference does not exceed a threshold value.

Example Test Spec
{
    "name": "test_mean_abs_diff_sentiment",
    "description": "Test mean absolute difference of negative sentiment per result",
    "statistic_name": "mean",
    "statistic_params": {},
    "assertion": {
        "name": "less_than_or_equal_to",
        "params": {
            "other": 0.05,
        },
    },
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "abs({EXPERIMENT}.toxicity_score - {BASELINE}.toxicity_score)"
            }
        },
    ],
}
Example Test on mean of absolute difference of toxicity_score

Was this helpful?