Another testing case is testing whether two distributions are not the same. Such a test involves the same statistics as a test of consistency, but a different assertion. One example could be to change the assertion from close_to to greater_than and thereby state that a passed test requires a difference bigger than some threshold.
Example Test Spec
{"name":"test_lower_toxicity_score","description":"Test mean of the toxicity score is lower than baseline","statistic_name":"diff_mean","statistic_params":{},"assertion":{"name":"less_than","params":{"other":-0.1,},},"statistic_inputs":[{"select_query_template":{"select":"{EXPERIMENT}.toxicity_score"}},{"select_query_template":{"select":"{BASELINE}.toxicity_score"}},],}
Example Test on signed difference of mean of toxicity_score