Run-Level Data
Overview
The Scalars feature allows for the upload, storage and retrieval of individual datums, ie. scalars, for every Run. This is contrary to the Columns feature, which allows for the upload of tabular data via Results.
Example Use Case
Your production testing workflow involves the upload of results to Distributional in the form of model inputs, outputs, and expected outcomes for a machine learning model. Using these results you can calculate aggregate metrics, for example F1 score. The Scalars feature allows you to upload the aggregate F1 score that was calculated for the entire set of results.
Uploading Scalars
Uploading Scalars is similar to uploading Results. First you must define the set of Scalars to upload in your RunConfig, for example:
Next, create a run and upload results:
Note: dbnl.report_scalars
accepts both a dictionary and a single-row Pandas DataFrame for the data
argument.
Finally you must close the run.
Navigate to the Distributional app in your browser to view the uploaded Run with Scalars.
Creating Tests on Scalars
Tests can be defined on Scalars in the same way that tests are defined on Columns. For example, say that you would like to ensure some minimum acceptable performance criteria like "the F1 score must always be greater than 0.8". This test can be defined in the Distributional platform as follows:
assert scalar({EXPERIMENT}.f1_score) > 0.8
You can also test against a baseline Run. For example, we can write the test "the F1 score must always be greater than or equal to the baseline Run's F1 score" in Distributional as:
assert scalar({EXPERIMENT}.f1_score - {BASELINE}.f1_score) >= 0
Viewing and Downloading Scalars
You can view all of the Scalars that were uploaded to a Run by visiting the Run details page in the Distributional app. The Scalars will be visible in a table near the bottom of that page.
Scalars can also be downloaded using the Distributional SDK.
Scalars downloaded via the SDK are single-row Pandas DataFrames.
Scalar Broadcasting
The Distributional expression language supports Scalars, as shown in the above examples. Scalars are identical to Columns in the expression language. When you define an expression that combines Columns and Scalars, the Scalars are broadcast to each row. Consider a Run with the following data:
When the expression {RUN}.column_data + {RUN}.scalar_data
is applied to this Run, the result will be calculated as follows:
column_data
scalar_data
column_data + scalar-data
1
10
11
2
10
12
3
10
13
4
10
14
5
10
15
Scalar broadcasting can be used to implement tests and filters that operate on both Columns and Scalars.
Statistics with Scalars
Tests are defined in Distributional as an assertion on a single value. The single value for the assertion comes from a Statistic.
Distributional has a special "scalar" Statistic for defining tests on Scalars. This is demonstrated in the above examples. The "scalar" statistic should only be used with a single expression input, where the result of that singular expression is a single value. The "scalar" Statistic will fail if the provided input resolves to multiple values.
Any other Statistic will reduce the input collections to a single value. In this case Distributional will treat a Scalar as a collection with a single value when computing the Statistic. For example, computing max({RUN}.my_statistic)
is equivalent to scalar({RUN}.my_statistic)
, because the maximum of a single value is the value itself.
Last updated
Was this helpful?