Test That Specific Results Have Matching Behavior
When the results from a run have unique identifiers, one can create a special type of tests for testing matching behavior at a per-result level. One example would be testing the mean of per-result absolute difference does not exceed a threshold value.
Example Test Spec
{
"name": "test_mean_abs_diff_sentiment",
"description": "Test mean absolute difference of negative sentiment per result",
"statistic_name": "mean",
"statistic_params": {},
"assertion": {
"name": "less_than_or_equal_to",
"params": {
"other": 0.05,
},
},
"statistic_inputs": [
{
"select_query_template": {
"select": "abs({EXPERIMENT}.toxicity_score - {BASELINE}.toxicity_score)"
}
},
],
}
Was this helpful?

