Reporting Runs

The full process of reporting a Run ultimately breaks down into three steps:

  1. Creating the Run, which includes defining its structure and any relevant metadata

  2. Reporting the results of the Run, which include columnar data and scalars

  3. Closing the Run to mark it as complete once reporting is finished

Each of these steps can be done separately via our SDK, but it can also be done conveniently with a single SDK function call: dbnl.report_run_with_results, which is recommended. See Putting it All Together below.

We also have an eval library available that lets you generate useful metrics on your columns and report them to DBNL alongside the Run results. Check out dbnl.eval.report_run_with_results for more information.

Creating a Run

The important parts of creating a run are providing identifying information — in the form of a name and metadata — and defining the structure of the data you'll be reporting to it. As mentioned in the previous section, this structure is called the Run Schema.

Run Schema

A Run schema defines four aspects of the Run's structure:

  • Columns (the data each row in your results will contain)

  • Scalars (any Run-level data you want to report)

  • Index (which column or columns uniquely identify rows in your results)

  • Components (functional groups to organize the reported results in the form of a graph)

Columns

Columns are the only required part of a schema and are core to reporting Runs, as they define the shape your results will take. You report your column schema as a list of objects, which contain the following fields:

  • name: The name of the column

  • type: The type of the column, e.g. int. For a list of available types, see the SDK reference

  • description: A descriptive blurb about what the column is

  • component: Which part of your application the column belongs to (see Components below)

Example Columns JSON
[
    { 
        "name": "error_type",
        "type": "category",
        "component": "classifier"
    },
    {
        "name": "email",
        "type": "string",
        "description": "raw email text content from source",
        "component": "input"
    },
    { 
        "name": "spam-pred",
        "type": "boolean",
        "component": "classifier"
    },
    {
        "name": "email_id",
        "type": "string",
        "description": "unique id for each email"
    }
]

Scalars

Scalars represent any data that live at the Run level; that is, the represent single data points that apply to your entire Run. For example, you may want to calculate an F1 score for the entirety of a result set for your model. The scalar schema is also a list of objects, and takes on the same fields as the column schema above.

Example Scalars JSON
[
    {
        "name": "model_F1",
        "type": "float",
        "description": "F1 Score",
        "component": "classifier"
    },
    { 
        "name": "model_recall",
        "type": "float",
        "description": "Model Recall",
        "component": "classifier"
    }
]

Index

Using the index field within the schema, you have the ability to designate Unique Identifiers – specific columns which uniquely identify matching results between Runs. Adding this information facilitates more direct comparisons when testing your application's behavior and makes it easier to explore your data.

Example Index JSON
["email_id"]

Components

Components are defined within the components_dag field of the schema. This defines the topological structure of your app as a Directed Acyclic Graph (DAG). Using this, you can tell DBNL which part of your application different columns correspond to, enabling a more granular understanding of your app's behavior.

Example Components JSON
// Each key defines a component, and the corresponding list defines the
// components downstream from it in your DAG
{
    "input": ["classifier"]
    "classifier": [],
}

You can learn more about creating a Run schema in the SDK reference for dbnl.create_run_schema. There is also a function to create a Run, but we recommend the method shown in the section below.

Note that if you do not provide a schema when you report a run, DBNL will infer one from the structure of the results you've uploaded. You can additionally still provide an index parameter directly to the report_run_with_results function.

Reporting Run Results

Check out the section on metrics to see how DBNL can supplement your results with more useful data.

Once you've defined the structure of your run, you can upload data to DBNL to report the results of that run. As mentioned above, there are two kinds of results from your run:

  • The row-level column results (these each represent the data of a single "usage" of your application)

  • The Run-level scalar results (these represent data that apply to all usages in your Run as a whole)

DBNL expects you to upload your results data in the form of a pandas DataFrame. Note that scalars can be uploaded as a single-row DataFrame or as a dictionary of values.

Example Results
import pandas as pd

column_results = pd.DataFrame({
    "error_type": ["none", "none", "none", "none"],
    "email": [
        "Hello, I am interested in your product. Please send me more information.",
        "Congratulations! You've won a lottery. Click here to claim your prize.",
        "Hi, can we schedule a meeting for next week?",
        "Don't miss out on this limited time offer! Buy now and save 50%."
    ],
    "spam-pred": [False, True, False, True],
    "email_id": ["1", "2", "3", "4"]
})

scalar_results = pd.DataFrame({
    "model_F1": [0.8],
    "model_recall": [0.74]
})
# Above is equalent to:
scalar_results = {
    "model_F1": 0.8,
    "model_recall": 0.74 
}

There are functions to upload column results and scalar results in the SDK, but, again, we recommend the method in the section below!

Closing a Run

Once you're finished uploading results to DBNL for your Run, the run should be closed, to mark it as ready to be used in Test Sessions. Note that reporting results to a Run will overwrite any existing results, and, once closed, the Run can no longer have results uploaded. If you need to close a Run, there is an SDK function for it, or you can close an open Run from its page on the UI.

Putting it All Together

Now that you understand each step, you can easily integrate all of this into your codebase with a few simple function calls via our SDK:

import dbnl
import pandas as pd
dbnl.login()


proj = dbnl.get_or_create_project(name="My Project")
run_schema = dbnl.create_run_schema(
    columns=[
        {"name": "error_type", "type": "category", "component": "classifier"},
        {"name": "email", "type": "string", "description": "raw email text content from source", "component": "input"},
        {"name": "spam-pred", "type": "boolean", "component": "classifier"},
        {"name": "email_id", "type": "string", "description": "unique id for each email"},
    ],
    scalars=[
        {
            "name": "model_F1",
            "type": "float",
            "description": "F1 Score",
            "component": "classifier"
        },
        { 
            "name": "model_recall",
            "type": "float",
            "description": "Model Recall"
        }
    ],
    index=["email_id"],
    components_dag={
        "input": ["classifier"]
        "classifier": [],
    }
)
# Creates the run, reports results, and closes the run.
run = dbnl.report_run_with_results(
    project=proj,
    display_name="Run 1 of Email Classifier"
    run_schema=run_schema,
    column_data=pd.DataFrame({
        "error_type": ["none", "none", "none", "none"],
        "email": [
            "Hello, I am interested in your product. Please send me more information.",
            "Congratulations! You've won a lottery. Click here to claim your prize.",
            "Hi, can we schedule a meeting for next week?",
            "Don't miss out on this limited time offer! Buy now and save 50%."
        ],
        "spam-pred": [False, True, False, True],
        "email_id": ["1", "2", "3", "4"]
    }),
    scalar_data={
        "model_F1": 0.8,
        "model_recall": 0.74 
    }
)

Last updated

Was this helpful?