Another testing case is testing whether two distributions are not the same. Such a test involves the same statistics as a test of consistency, but a different assertion. One example could be to change the assertion from close_to to greater_than and thereby state that a passed test requires a difference bigger than some threshold.
Example Test Spec
{
"name": "test_lower_toxicity_score",
"description": "Test mean of the toxicity score is lower than baseline",
"statistic_name": "diff_mean",
"statistic_params": {},
"assertion": {
"name": "less_than",
"params": {
"other": -0.1,
},
},
"statistic_inputs": [
{
"select_query_template": {
"select": "{EXPERIMENT}.toxicity_score"
}
},
{
"select_query_template": {
"select": "{BASELINE}.toxicity_score"
}
},
],
}
Example Test on signed difference of mean of toxicity_score