1 of 3

Using Filters

Distributional allows users to apply filters on run data they have uploaded. Applying a filter selects for only the rows that match the filter criteria. The filtered rows can then be visualized or used to create tests.

We will show how filters can be used to explore the data created by the and build filtered tests.

Filters in the Compare Page

Filters can be written at the top of the compare page, which is accessible from the project detail page. Users write filters to select for only the rows they wish to visualize / inspect.

Below is a list of DBNL defined functions that can be used in filter expressions:

function name

aliases

description

and

Logical AND operation of two or more boolean columns

Logical OR operation of two or more boolean columns

not

Logical NOT operation of a boolean column

less_than

['lt']

Computes the element-wise less than comparison of two columns. input1 < input2

less_than_or_equal_to

['lte']

Computes the element-wise less than or equal to comparison of two columns. input1 <= input2

greater_than

['gt']

Computes the element-wise greater than comparison of two columns. input1 > input2

greater_than_or_equal_to

['gte']

Computes the element-wise greater than or equal to comparison of two columns. input1 >= input2

equal_to

['eq']

Computes the element-wise greater than or equal to comparison of two columns

Here is an example of a more complicated filter that selects for rows that have their loc column equal to the string 'NY' and their respective churn_score > 0.9:

and(gt({RUN}.churn_score, 0.9), equal_to({RUN}.loc, 'NY'))

Use single quotes ' for filtering of string variables.

Filters in Tests

Filters can also be used to specify a sub-selection of rows in runs you would like to include in the test computation.

For example, our goal could be to create a test that asserts that, for rows where the loc column is ‘NY’, the absolute difference of means of the correct churn predictions is <= 0.2 between baseline and experiment runs.

We will walk through how this can be accomplished:

1. Navigate to the Project Detail page and click on “Configure Tests”.

Click Add Test on the Test Configuration page. Don’t forget to also set a baseline run for automated test configuration.

Create the test with the filter specified on the baseline and experiment run.

Filter for the baseline Run:

equal_to({BASELINE}.loc, 'NY')

Filter for the experiment Run:

equal_to({EXPERIMENT}.loc, 'NY')

4. You can now see the new test in the Test Configuration Page. When new data is uploaded, this test will automatically run and compare the new run (as experiment) against the selected baseline run.

When new run data is uploaded, this test will run automatically and use the defined filters to sub-select for the rows that have the loc column equal to ‘NY’.

The full Test Spec in JSON format is shown below.

{
    "name": "abs diff of mean of correct churn preds of NY users is within 0.2",
    "statistic_name": "abs_diff_mean",
    "statistic_params": {},
    "assertions": [
        {
            "name": "less_than_or_equal_to",
            "params": {
                "other": 0.2
            },
        }
    ],
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "{BASELINE}.pred_correct",
                "filter": "equal_to({BASELINE}.loc, 'NY')"
            }
        },
        {
            "select_query_template": {
                "select": "{EXPERIMENT}.pred_correct",
                "filter": "equal_to({EXPERIMENT}.loc, 'NY')"
            }
        },
    ],
}

Filters in the Compare Page

Filters can be written at the top of the compare page, which is accessible from the project detail page. Users write filters to select for only the rows they wish to visualize / inspect.

Below is a list of DBNL defined functions that can be used in filter expressions:

function name

aliases

description

and

Logical AND operation of two or more boolean columns

Logical OR operation of two or more boolean columns

not

Logical NOT operation of a boolean column

less_than

['lt']

Computes the element-wise less than comparison of two columns. input1 < input2

less_than_or_equal_to

['lte']

Computes the element-wise less than or equal to comparison of two columns. input1 <= input2

greater_than

['gt']

Computes the element-wise greater than comparison of two columns. input1 > input2

greater_than_or_equal_to

['gte']

Computes the element-wise greater than or equal to comparison of two columns. input1 >= input2

equal_to

['eq']

Computes the element-wise greater than or equal to comparison of two columns

Here is an example of a more complicated filter that selects for rows that have their loc column equal to the string 'NY' and their respective churn_score > 0.9:

and(gt({RUN}.churn_score, 0.9), equal_to({RUN}.loc, 'NY'))

Use single quotes ' for filtering of string variables.

Filters in Tests

Filters can also be used to specify a sub-selection of rows in runs you would like to include in the test computation.

We will walk through how this can be accomplished:

1. Navigate to the Project Detail page and click on “Configure Tests”.

Click Add Test on the Test Configuration page. Don’t forget to also set a baseline run for automated test configuration.

Create the test with the filter specified on the baseline and experiment run.

Filter for the baseline Run:

equal_to({BASELINE}.loc, 'NY')

Filter for the experiment Run:

equal_to({EXPERIMENT}.loc, 'NY')

4. You can now see the new test in the Test Configuration Page. When new data is uploaded, this test will automatically run and compare the new run (as experiment) against the selected baseline run.

When new run data is uploaded, this test will run automatically and use the defined filters to sub-select for the rows that have the loc column equal to ‘NY’.

The full Test Spec in JSON format is shown below.

{
    "name": "abs diff of mean of correct churn preds of NY users is within 0.2",
    "statistic_name": "abs_diff_mean",
    "statistic_params": {},
    "assertions": [
        {
            "name": "less_than_or_equal_to",
            "params": {
                "other": 0.2
            },
        }
    ],
    "statistic_inputs": [
        {
            "select_query_template": {
                "select": "{BASELINE}.pred_correct",
                "filter": "equal_to({BASELINE}.loc, 'NY')"
            }
        },
        {
            "select_query_template": {
                "select": "{EXPERIMENT}.pred_correct",
                "filter": "equal_to({EXPERIMENT}.loc, 'NY')"
            }
        },
    ],
}