LogoLogo
AboutBlogLaunch app ↗
v0.20.x
v0.20.x
  • Introduction to AI Testing
  • Welcome to Distributional
  • Motivation
  • What is AI Testing?
  • Stages in the AI Software Development Lifecycle
    • Components of AI Testing
  • Distributional Testing
  • Getting Access to Distributional
  • Learning about Distributional
    • The Distributional Framework
    • Defining Tests in Distributional
      • Automated Production test creation & execution
      • Knowledge-based test creation
      • Comprehensive testing with Distributional
    • Reviewing Test Sessions and Runs in Distributional
      • Reviewing and recalibrating automated Production tests
      • Insights surfaced elsewhere on Distributional
      • Notifications
    • Data in Distributional
      • The flow of data
      • Components and the DAG for root cause analysis
      • Uploading data to Distributional
      • Living in your VPC
  • Using Distributional
    • Getting Started
    • Access
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
    • Data
      • Data Objects
      • Run-Level Data
      • Data Storage Integrations
      • Data Access Controls
    • Testing
      • Creating Tests
        • Test Page
        • Test Drawer Through Shortcuts
        • Test Templates
        • SDK
      • Defining Assertions
      • Production Testing
        • Auto-Test Generation
        • Recalibration
        • Notable Results
        • Dynamic Baseline
      • Testing Strategies
        • Test That a Given Distribution Has Certain Properties
        • Test That Distributions Have the Same Statistics
        • Test That Columns Are Similarly Distributed
        • Test That Specific Results Have Matching Behavior
        • Test That Distributions Are Not the Same
      • Executing Tests
        • Manually Running Tests Via UI
        • Executing Tests Via SDK
      • Reviewing Tests
      • Using Filters
        • Filters in the Compare Page
        • Filters in Tests
    • Python SDK
      • Quick Start
      • Functions
        • login
        • Project
          • create_project
          • copy_project
          • export_project_as_json
          • get_project
          • get_or_create_project
          • import_project_from_json
        • Run Config
          • create_run_config
          • get_latest_run_config
          • get_run_config
          • get_run_config_from_latest_run
        • Run Results
          • get_column_results
          • get_scalar_results
          • get_results
          • report_column_results
          • report_scalar_results
          • report_results
        • Run
          • close_run
          • create_run
          • get_run
          • report_run_with_results
        • Baseline
          • create_run_query
          • get_run_query
          • set_run_as_baseline
          • set_run_query_as_baseline
        • Test Session
          • create_test_session
      • Objects
        • Project
        • RunConfig
        • Run
        • RunQuery
        • TestSession
        • TestRecalibrationSession
        • TestGenerationSession
        • ResultData
      • Experimental Functions
        • create_test
        • get_tests
        • get_test_sessions
        • wait_for_test_session
        • get_or_create_tag
        • prepare_incomplete_test_spec_payload
        • create_test_recalibration_session
        • wait_for_test_recalibration_session
        • create_test_generation_session
        • wait_for_test_generation_session
      • Eval Module
        • Quick Start
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
        • Eval Module Functions
          • Index of functions
          • eval
          • eval.metrics
    • Notifications
    • Release Notes
  • Tutorials
    • Instructions
    • Hello World (Sentiment Classifier)
    • Trading Strategy
    • LLM Text Summarization
      • Setting the Scene
      • Prompt Engineering
      • Integration testing for text summarization
      • Practical considerations
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • Overview
  • Example Use Case
  • Uploading Scalars
  • Creating Tests on Scalars
  • Viewing and Downloading Scalars
  • Scalar Broadcasting
  • Statistics with Scalars

Was this helpful?

Export as PDF
  1. Using Distributional
  2. Data

Run-Level Data

Overview

The Scalars feature allows for the upload, storage and retrieval of individual datums, ie. scalars, for every Run. This is contrary to the Columns feature, which allows for the upload of tabular data via Results.

Example Use Case

Your production testing workflow involves the upload of results to Distributional in the form of model inputs, outputs, and expected outcomes for a machine learning model. Using these results you can calculate aggregate metrics, for example F1 score. The Scalars feature allows you to upload the aggregate F1 score that was calculated for the entire set of results.

Uploading Scalars

Uploading Scalars is similar to uploading Results. First you must define the set of Scalars to upload in your RunConfig, for example:

project = dbnl.get_project(name="My Project")

run_config = dbnl.create_run_config(
  project=project,
  display_name="Classifier Regression",
  columns=[...],
  scalars=[{
    "name": "f1_score",
    "description": "Aggregate F1 score over all results",
    "type": "float",
  }],
)

Next, create a run and upload results:

run = dbnl.create_run(project=project, run_config=run_config)

dbnl.report_column_results(run=run, data=my_dataframe)
dbnl.report_scalar_results(run=run, data={"f1_score": 0.95})

Note: dbnl.report_scalars accepts both a dictionary and a single-row Pandas DataFrame for the data argument.

Finally you must close the run.

dbnl.close_run(run=run)

Navigate to the Distributional app in your browser to view the uploaded Run with Scalars.

Creating Tests on Scalars

Tests can be defined on Scalars in the same way that tests are defined on Columns. For example, say that you would like to ensure some minimum acceptable performance criteria like "the F1 score must always be greater than 0.8". This test can be defined in the Distributional platform as follows:

assert scalar({EXPERIMENT}.f1_score) > 0.8

You can also test against a baseline Run. For example, we can write the test "the F1 score must always be greater than or equal to the baseline Run's F1 score" in Distributional as:

assert scalar({EXPERIMENT}.f1_score - {BASELINE}.f1_score) >= 0

Viewing and Downloading Scalars

You can view all of the Scalars that were uploaded to a Run by visiting the Run details page in the Distributional app. The Scalars will be visible in a table near the bottom of that page.

Scalars can also be downloaded using the Distributional SDK.

run = dbnl.get_run(run_id="run_xyz")
scalars = dbnl.get_scalar_results(run=run)
print(scalars)

Scalars downloaded via the SDK are single-row Pandas DataFrames.

Scalar Broadcasting

The Distributional expression language supports Scalars, as shown in the above examples. Scalars are identical to Columns in the expression language. When you define an expression that combines Columns and Scalars, the Scalars are broadcast to each row. Consider a Run with the following data:

column_data: [1, 2, 3, 4, 5]
scalar_data: 10

When the expression {RUN}.column_data + {RUN}.scalar_data is applied to this Run, the result will be calculated as follows:

column_data

scalar_data

column_data + scalar-data

1

10

11

2

10

12

3

10

13

4

10

14

5

10

15

Scalar broadcasting can be used to implement tests and filters that operate on both Columns and Scalars.

Statistics with Scalars

Tests are defined in Distributional as an assertion on a single value. The single value for the assertion comes from a Statistic.

Distributional has a special "scalar" Statistic for defining tests on Scalars. This is demonstrated in the above examples. The "scalar" statistic should only be used with a single expression input, where the result of that singular expression is a single value. The "scalar" Statistic will fail if the provided input resolves to multiple values.

Any other Statistic will reduce the input collections to a single value. In this case Distributional will treat a Scalar as a collection with a single value when computing the Statistic. For example, computing max({RUN}.my_statistic) is equivalent to scalar({RUN}.my_statistic), because the maximum of a single value is the value itself.

PreviousData ObjectsNextData Storage Integrations

Was this helpful?

Writing a test for minimum performance using the Distributional app.
Writing a regression test using the distributional app.
Scalar data viewable from the Run page on the Distributional app.