LogoLogo
AboutBlogLaunch app ↗
v0.23.x
v0.23.x
  • Get Started
  • Overview
  • Getting Access to Distributional
  • Install the Python SDK
  • Quickstart
  • Learning about Distributional
    • Distributional Concepts
    • Why We Test Data Distributions
    • The Flow of Data
  • Using Distributional
    • Projects
    • Runs
      • Reporting Runs
      • Setting a Baseline Run
    • Metrics
    • Tests
      • Creating Tests
        • Using Filters in Tests
        • Available Statistics and Assertions
      • Running Tests
      • Reviewing Tests
        • What Is a Similarity Index?
    • Notifications
    • Access Controls
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
  • Platform
    • Sandbox
    • Self-hosted
      • Architecture
      • Deployment
        • Helm Chart
        • Terraform Module
      • Networking
      • OIDC Authentication
      • Data Security
  • Reference
    • Query Language
      • Functions
    • Python SDK
      • dbnl
      • dbnl.util
      • dbnl.experimental
      • Classes
      • Eval Module
        • Quick Start
        • dbnl.eval
        • dbnl.eval.metrics
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
      • Classes
  • CLI
  • Versions
    • Release Notes
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • Creating a Run
  • Run Schema
  • Reporting Run Results
  • Closing a Run
  • Putting it All Together

Was this helpful?

Export as PDF
  1. Using Distributional
  2. Runs

Reporting Runs

PreviousRunsNextSetting a Baseline Run

Last updated 1 month ago

Was this helpful?

The full process of reporting a Run ultimately breaks down into three steps:

  1. Creating the Run, which includes defining its structure and any relevant metadata

  2. Reporting the results of the Run, which include columnar data and scalars

  3. Closing the Run to mark it as complete once reporting is finished

Each of these steps can be done separately via our , but it can also be done conveniently with a single SDK function call: dbnl.report_run_with_results, which is recommended. See below.

We also have an eval library available that lets you generate useful metrics on your columns and report them to DBNL alongside the Run results. Check out for more information.

Creating a Run

The important parts of creating a run are providing identifying information — in the form of a name and metadata — and defining the structure of the data you'll be reporting to it. As mentioned in the previous section, this structure is called the Run Schema.

Run Schema

In older versions of DBNL, the job of the schema was done by something called the "Run Config". The Run Config has been fully deprecated, and you should check the and update any code you have.

A Run schema defines four aspects of the Run's structure:

  • Columns (the data each row in your results will contain)

  • Scalars (any Run-level data you want to report)

  • Index (which column or columns uniquely identify rows in your results)

  • Components (functional groups to organize the reported results in the form of a graph)

Columns

Columns are the only required part of a schema and are core to reporting Runs, as they define the shape your results will take. You report your column schema as a list of objects, which contain the following fields:

  • name: The name of the column

  • description: A descriptive blurb about what the column is

  • component: Which part of your application the column belongs to (see Components below)

Example Columns JSON
[
    { 
        "name": "error_type",
        "type": "category",
        "component": "classifier"
    },
    {
        "name": "email",
        "type": "string",
        "description": "raw email text content from source",
        "component": "input"
    },
    { 
        "name": "spam-pred",
        "type": "boolean",
        "component": "classifier"
    },
    {
        "name": "email_id",
        "type": "string",
        "description": "unique id for each email"
    }
]

Scalars

Example Scalars JSON
[
    {
        "name": "model_F1",
        "type": "float",
        "description": "F1 Score",
        "component": "classifier"
    },
    { 
        "name": "model_recall",
        "type": "float",
        "description": "Model Recall",
        "component": "classifier"
    }
]

Index

Using the index field within the schema, you have the ability to designate Unique Identifiers – specific columns which uniquely identify matching results between Runs. Adding this information facilitates more direct comparisons when testing your application's behavior and makes it easier to explore your data.

Example Index JSON
["email_id"]

Components

Example Components JSON
// Each key defines a component, and the corresponding list defines the
// components downstream from it in your DAG
{
    "input": ["classifier"]
    "classifier": [],
}

Note that if you do not provide a schema when you report a run, DBNL will infer one from the structure of the results you've uploaded. You can additionally still provide an index parameter directly to the report_run_with_results function.

Reporting Run Results

Once you've defined the structure of your run, you can upload data to DBNL to report the results of that run. As mentioned above, there are two kinds of results from your run:

  • The row-level column results (these each represent the data of a single "usage" of your application)

  • The Run-level scalar results (these represent data that apply to all usages in your Run as a whole)

DBNL expects you to upload your results data in the form of a pandas DataFrame. Note that scalars can be uploaded as a single-row DataFrame or as a dictionary of values.

Example Results
import pandas as pd

column_results = pd.DataFrame({
    "error_type": ["none", "none", "none", "none"],
    "email": [
        "Hello, I am interested in your product. Please send me more information.",
        "Congratulations! You've won a lottery. Click here to claim your prize.",
        "Hi, can we schedule a meeting for next week?",
        "Don't miss out on this limited time offer! Buy now and save 50%."
    ],
    "spam-pred": [False, True, False, True],
    "email_id": ["1", "2", "3", "4"]
})

scalar_results = pd.DataFrame({
    "model_F1": [0.8],
    "model_recall": [0.74]
})
# Above is equalent to:
scalar_results = {
    "model_F1": 0.8,
    "model_recall": 0.74 
}

Closing a Run

Once you're finished uploading results to DBNL for your Run, the run should be closed, to mark it as ready to be used in Test Sessions. Note that reporting results to a Run will overwrite any existing results, and, once closed, the Run can no longer have results uploaded. If you need to close a Run, there is an SDK function for it, or you can close an open Run from its page on the UI.

Putting it All Together

Now that you understand each step, you can easily integrate all of this into your codebase with a few simple function calls via our SDK:

import dbnl
import pandas as pd
dbnl.login()


proj = dbnl.get_or_create_project(name="My Project")
run_schema = dbnl.create_run_schema(
    columns=[
        {"name": "error_type", "type": "category", "component": "classifier"},
        {"name": "email", "type": "string", "description": "raw email text content from source", "component": "input"},
        {"name": "spam-pred", "type": "boolean", "component": "classifier"},
        {"name": "email_id", "type": "string", "description": "unique id for each email"},
    ],
    scalars=[
        {
            "name": "model_F1",
            "type": "float",
            "description": "F1 Score",
            "component": "classifier"
        },
        { 
            "name": "model_recall",
            "type": "float",
            "description": "Model Recall"
        }
    ],
    index=["email_id"],
    components_dag={
        "input": ["classifier"]
        "classifier": [],
    }
)
# Creates the run, reports results, and closes the run.
run = dbnl.report_run_with_results(
    project=proj,
    display_name="Run 1 of Email Classifier"
    run_schema=run_schema,
    column_data=pd.DataFrame({
        "error_type": ["none", "none", "none", "none"],
        "email": [
            "Hello, I am interested in your product. Please send me more information.",
            "Congratulations! You've won a lottery. Click here to claim your prize.",
            "Hi, can we schedule a meeting for next week?",
            "Don't miss out on this limited time offer! Buy now and save 50%."
        ],
        "spam-pred": [False, True, False, True],
        "email_id": ["1", "2", "3", "4"]
    }),
    scalar_data={
        "model_F1": 0.8,
        "model_recall": 0.74 
    }
)

type: The type of the column, e.g. int. For a list of available types, see

Scalars represent any data that live at the Run level; that is, the represent single data points that apply to your entire Run. For example, you may want to calculate an for the entirety of a result set for your model. The scalar schema is also a list of objects, and takes on the same fields as the column schema above.

Components are defined within the components_dag field of the schema. This defines the topological structure of your app as a . Using this, you can tell DBNL which part of your application different columns correspond to, enabling a more granular understanding of your app's behavior.

You can learn more about creating a Run schema in the SDK reference for . There is also a , but we recommend the method shown in the .

Check out the section on to see how DBNL can supplement your results with more useful data.

There are functions to upload and in the SDK, but, again, we recommend the method in the !

F1 score
Directed Acyclic Graph (DAG)
metrics
dbnl.create_run_schema
function to create a Run
section below
column results
scalar results
section below
SDK
SDK reference
Putting it All Together
dbnl.eval.report_run_with_results
the SDK reference