Test That Columns Are Similarly Distributed

One general approach to test if two columns are similarly distributed is using a nonparametric statistic. DBNL offers two such statistics: scaled_ks_stat for testing ordinal distributions and scaled_chi2_stat for testing nominal distributions.

Example Test Spec
{
    "name": "discrepancy_of_text_coherence_score",
    "description": "Test the nonparametric discrepancy of the coherence score distributions",
    "statistic_name": "scaled_ks_stat",
    "statistic_params": {},
    "assertion": {
        "name": "less_than_or_equal_to",
        "params": {
            "other": 0.25,
        },
    },
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "{EXPERIMENT}.coherence_score"
            }
        },
        {
            "select_query_template": {
                "select": "{BASELINE}.coherence_score"
            }
        },
    ],
}
Example test on discrepancy of distribution of coherence_score

Was this helpful?