1 of 65

Python SDK

The primary mechanism for submitting data to Distributional is through our Python SDK. This section contains information about how to set up the SDK and use it for your AI testing purposes.

Quick Start

Below is a basic working example that highlights the SDK workflow. If you have not yet installed the SDK, follow these instructions.

import dbnl
import numpy as np
import pandas as pd


num_results = 500
test_data = pd.DataFrame({
    "idx": np.arange(num_results),
    "age": np.random.randint(5,95, num_results),
    "loc": np.random.choice(["NY", "CA", "FL"], num_results),
    "churn_gtruth": np.random.choice([True, False], num_results)
})

# example user ml app / model
def churn_predictor(input):
    return 1.0 / (1.0 + np.exp(-(input["age"] / 10.0 - 3.5)))

def evaluate_predicton(data):
    return (data["churn_score"] > 0.5 and data["churn_gtruth"]) or \
            (data["churn_score"] < 0.5 and not data["churn_gtruth"])

test_data["churn_score"] = test_data.apply(churn_predictor, axis=1)
test_data["pred_correct"] = test_data.apply(evaluate_predicton, axis=1)
test_data = test_data.astype({"age": "int", "loc": "category"})

# Use DBNL
dbnl.login(api_token="<COPY_PASTE_DBNL_API_TOKEN>")
proj = dbnl.get_or_create_project(name="example_churn_predictor")
run = dbnl.report_run_with_results(
    project=proj,
    column_results=test_data,
    row_id=["idx"],
)

Functions

Click through to view all the SDK functions.

login

Authenticate dbnl SDK

dbnl.login(
    *,
    api_token: Optional[str] = None,
    namespace_id: Optional[str] = None,    
    api_url: Optional[str] = None,
    app_url: Optional[str] = None,  
) -> None

Setup dbnl SDK to make authenticated requests. After login is run successfully, the dbnl client will be able to issue secure and authenticated requests against hosted endpoints of the dbnl service.

dbnl.login must be run before any other functions in the DBNL workflow

Parameters

Arguments

Description

api_token

namespace_id

Namespace ID to use for the session; available namespaces can be found with get_my_namespaces().

api_url

The base url of the Distributional API. For SaaS users, set this variable to api.dbnl.com. For other users, please contact your sys admin.

app_url

An optional base url of the Distributional app. If this variable is not set, the app url is inferred from the DBNL_API_URL variable. For on-prem users, please contact your sys admin if you cannot reach the Distributional UI.

Examples

import dbnl
# when login() is called without specifying a token, 
# it will use the `DBNL_API_TOKEN` env var
dbnl.login()

# login() can be called with a specific API Token
dbnl.login(api_token="YOUR_TOKEN_AAAA_BBBB_CCCC_DDDD")

Project

Functions that interact with a dbnl Project

create_project

Create a new dbnl Project

dbnl.create_project(
    *,
    name: str,
    description: Optional[str] = None,
) -> :

Parameters

Arguments

Description

name

description

An optional description for the dbnl Project, defaults to None. Description is limited to 255 characters.

Returns

Type

Description

The newly created dbnl Project.

Examples

import dbnl
dbnl.login()


proj_1 = dbnl.create_project(name="test_p1")

# DBNLConflictingProjectError: A DBNL Project with name test_p1 already exists.
proj_2 = dbnl.create_project(name="test_p1")

copy_project

Copy a dbnl Project with a new name and description

dbnl.copy_project(
    *,
    project: ,
    name: str,
    description: Optional[str] = None,
) -> :

Parameters

Arguments

Description

Returns

Type

Description

Examples

import dbnl
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_proj1")
proj2 = dbnl.copy_project(project=proj1, name="test_proj2")

assert proj2.name == "test_proj2"

export_project_as_json

Export a dbnl Project alongside its Test Specs and Tags as a JSON object

dbnl.export_project_as_json(
    *,
    project: ,
) -> dict[str, Any]:

Parameters

Arguments

Description

project

Returns

Type

Description

dict[str, Any]

JSON object representing the Project. Example:

{
    "project": {
        "name": "My Project",
        "description": "This is my project."
    },
    "tags": [
        {
            "name": "my-tag",
            "description" :"This is my tag."
        }
    ],
    "test_specs": [
        {
            "assertion": { "name": "less_than", "params": { "other": 0.5 } },
            "description": "Testing the difference in the example statistic",
            "name": "Gr.0: Non Parametric Difference: Example_Statistic",
            "statistic_inputs": [
                {
                    "select_query_template": {
                        "filter": null,
                        "select": "{EXPERIMENT}.Example_Statistic"
                    }
                },
                {
                    "select_query_template": {
                        "filter": null,
                        "select": "{BASELINE}.Example_Statistic"
                    }
                }
            ],
            "statistic_name": "my_stat",
            "statistic_params": {},
            "tag_names": ["my-tag"]
        }
    ]
}

Examples

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_proj")
export_json = dbnl.export_project_as_json(project=proj)

assert export_json["project"]["name"] == "test_proj"

get_project

Retrieve a dbnl Project

dbnl.get_project(
    *,
    name: str,
) -> :

Parameters

Arguments

Description

name

Returns

Type

Description

The dbnl Project with the given name.

Examples

import dbnl
dbnl.login()


proj_1 = dbnl.create_project(name="test_p1")
proj_2 = dbnl.get_project(name="test_p1")

# Calling get_project will yield same Project object
assert proj_1.id == proj_2.id

# DBNLProjectNotFoundError: A dnnl Project with name not_exist does not exist
proj_3 = dbnl.get_project(name="not_exist")

get_or_create_project

Retrieve the specified dbnl Project or create a new one if it does not exist

dbnl.get_or_create_project(
    *,
    name: str,
    description: Optional[str] = None,
) -> :

Parameters

Arguments

Description

name

description

An optional description for the dbnl Project, defaults to None. Description is limited to 255 characters.

Description cannot be updated with this function.

Returns

Type

Description

A new Project will be created with the specified name if there does not exist a Project with this name already. If there does exist a project with the name, the pre-existing Project will be returned.

Examples

import dbnl
dbnl.login()


proj_1 = dbnl.create_project(name="test_p1")
proj_2 = dbnl.get_or_create_project(name="test_p1")

# Calling get_or_create_project will yield same Project object
assert proj_1.id == proj_2.id

import_project_from_json

Create a new dbnl Project from a JSON object

dbnl.import_project_from_json(
    *,
    params: dict[str, Any],
) -> :

Parameters

Arguments

Description

params

{
    "project": {
        "name": "My Project",
        "description": "This is my project."
    },
    "tags": [
        {
            "name": "my-tag",
            "description" :"This is my tag."
        }
    ],
    "test_specs": [
        {
            "assertion": { "name": "less_than", "params": { "other": 0.5 } },
            "description": "Testing the difference in the example statistic",
            "name": "Gr.0: Non Parametric Difference: Example_Statistic",
            "statistic_inputs": [
                {
                    "select_query_template": {
                        "filter": null,
                        "select": "{EXPERIMENT}.Example_Statistic"
                    }
                },
                {
                    "select_query_template": {
                        "filter": null,
                        "select": "{BASELINE}.Example_Statistic"
                    }
                }
            ],
            "statistic_name": "my_stat",
            "statistic_params": {},
            "tag_names": ["my-tag"]
        }
    ]
}

Returns

Type

Description

The newly created dbnl Project.

Examples

import dbnl
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_proj1")
export_json = dbnl.export_project_as_json(project=proj1)
export_json["project"]["name"] = "test_proj2"
proj2 = dbnl.import_project_from_json(params=export_json)

assert proj2.name == "test_proj2"

Run Config

Functions related to dbnl RunConfig

create_run_config

Create a new dbnl RunConfig

dbnl.create_run_config(
    *,
    project: ,
    columns: list[dict[str, Any]],
    scalars: Optional[list[dict[str, Any]]] = None,
    description: Optional[str] = None,
    display_name: Optional[str] = None,
    row_id: Optional[list[str]] = None,
    components_dag: Optional[dict[str, list[str]]] = None,
) -> :

Parameters

Arguments

Description

project

columns

A list of column schema specs for the uploaded data, required keys name and type, optional key component, description and greater_is_better. type can be int, float, category, boolean, or string. component is a string that indicates the source of the data. e.g. "component" : "sentiment-classifier" or "component" : "fraud-predictor". Specified components must be present in the components_dag dictionary. greater_is_better is a boolean that indicates if larger values are better than smaller ones. False indicates smaller values are better. None indicates no preference. Example:

columns=[{"name": "pred_proba", "type": "float", "component": "fraud-predictor"}, {"name": "decision", "type": "boolean", "component": "threshold-decision"}, {"name": "requests", "type": "string", "description": "curl request response msg"}]

scalars

NOTE: scalars is available in SDK v0.0.15 and above. A list of scalar schema specs for the uploaded data, required keys name and type, optional key component, description and greater_is_better. type can be int, float, category, boolean, or string. component is a string that indicates the source of the data. e.g. "component" : "sentiment-classifier" or "component" : "fraud-predictor". Specified components must be present in the components_dag dictionary. greater_is_better is a boolean that indicates if larger values are better than smaller ones. False indicates smaller values are better. None indicates no preference. An example RunConfig scalars: scalars=[{"name": "accuracy", "type": "float", "component": "fraud-predictor"}, {"name": "error_type", "type": "category"}] Scalar schema is identical to column schema.

description

An optional description of the RunConfig, defaults to None. Descriptions are limited to 255 characters.

display_name

An optional display name of the RunConfig, defaults to None. Display names do not have to be unique.

row_id

An optional list of the column names that can be used as unique identifiers, defaults to None.

components_dag

Column Schema

Column Names

Column names can only be alphanumeric characters and underscores.

Supported Types

The following type supported as type in column schema

type

Notes

float

int

boolean

string

Any arbitrary string values. Raw string type columns do not produce any histogram or scatterplot on the web UI.

category

list

Currently only supports list of string values. List type columns do not produce any histogram or scatterplot on the web UI.

Components

The optional component key is for specifying the source of the data column in relationship to the AI/ML app subcomponents. Components are used in visualizing the components DAG.

Components DAG

The components_dag dictionary specifies the topological layout of the AI/ML app. For each key-value pair, the key represents the source component, and the value is a list of the leaf components. The following code snippet describes the DAG shown above.

components_dags={
    "TweetSource": ["EntityExtractor", "SentimentClassifier"],
    "EntityExtractor": ["TradeRecommender"],
    "SentimentClassifier": ["TradeRecommender"],
    "TradeRecommender": [],
    "Global": [],
}

Returns

Type

Description

A new dbnl RunConfig

Examples

Basic Usage

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean"},
    ],
    display_name="Basic RunConfig for spam prediction",
)

RunConfig with DAG

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig with DAG
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "component": "data_source", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean", "component": "spam_classifier"},
    ],
    display_name="Basic RunConfig for spam prediction",
    components_dag={
        "data_source": ["spam_classifier"]
        "spam_classifier": []
)

RunConfig with row_id

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean"},
        {"name": "email_id", "type": "string", "description": "unique id for each email"},
    ],
    display_name="Basic RunConfig for spam prediction",
    row_id=["email_id"],
)

RunConfig with scalars

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean"},
        {"name": "email_id", "type": "string", "description": "unique id for each email"},
    ],
    scalars=[
        {"name": "model_F1", "type": "float"},
        {"name": "model_recall", "type": "float"},
    ],
    display_name="Basic RunConfig for spam prediction",
)

get_latest_run_config

Retrieve the most recent dbnl RunConfig

dbnl.get_latest_run_config(
    *,
    project: ,
) -> :

Parameters

Arguments

Description

project

Returns

Type

Description

The dbnl RunConfig most recently created in the Project.

Examples

import dbnl
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(project=proj1, columns=[{"name": "error", "type": "float"}])

# Retrieving the latest RunConfig
runcfg2 = dbnl.get_latest_run_config(project=proj1)
assert runcfg1.id == runcfg2.id

get_run_config

Retrieve a dbnl RunConfig

dbnl.get_run_config(
    *,
    run_config_id: str,
) -> :

Parameters

Arguments

Description

run_config_id

Returns

Type

Description

The dbnl RunConfig with the given ID.

Examples

import dbnl
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(project=proj1, columns=[{"name": "error", "type": "float"}])

# Retrieving the RunConfig by ID
runcfg2 = dbnl.get_run_config(run_config_id=runcfg1.id)
assert runcfg1.id == runcfg2.id

# DBNLRunConfigNotFoundError: A DBNL RunConfig with id not_exist does not exist
run_config3 = dbnl.get_run_config(run_config_id="runcfg_not_exist")

get_run_config_from_latest_run

Retrieve a dbnl RunConfig from the most recent Run in a Project

dbnl.get_run_config(
    *,
    project: ,
) -> :

Parameters

Arguments

Description

project

Returns

Type

Description

The dbnl RunConfig from the most recent run in the Project.

Examples

import dbnl
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(project=proj1, columns=[{"name": "error", "type": "float"}])
run1 = dbnl.create_run(
    project=proj1, 
    run_config=runcfg1, 
)
# Retrieving the RunConfig by ID
runcfg2 = dbnl.get_run_config_from_latest_run(project=proj1)
assert runcfg1.id == runcfg2.id

Run Results

Functions related to Column and Scalar data uploaded within a Run.

As a convenience for reporting results and creating a Run, you can also check out report_run_with_results

get_column_results

Retrieve results from dbnl

dbnl.get_column_results(
    *,
    run: ,
) -> pandas.DataFrame:

Parameters

Arguments

Description

run

Returns

Type

Description

pandas.DataFrame

You can only call get_column_results after the run is closed.

Examples

import dbnl
import pandas as pd
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
uploaded_data = pd.DataFrame({"error": [0.11, 0.33, 0.52, 0.24]})
run = dbnl.report_run_with_results(
    project=proj,
    column_results=test_data,
)

downloaded_data = dbnl.get_column_results(run=run)
assert downloaded_data.equals(uploaded_data)

get_scalar_results

Retrieve results from dbnl

dbnl.get_scalar_results(
    *,
    run: ,
) -> pandas.DataFrame:

Parameters

Arguments

Description

Returns

Type

Description

You can only call get_scalar_results after the run is .

Examples

import dbnl
import pandas as pd
dbnl.login()

proj1 = dbnl.get_or_create_project(name="test_p1")

data = pd.DataFrame({"error": [0.11, 0.33, 0.52, 0.24]})
run = dbnl.report_run_with_results(
    project=proj,
    column_results=data,
    scalar_results={"rmse": 0.37}
)

downloaded_scalars = dbnl.get_scalar_results(run=run)

get_results

Retrieve results from dbnl

dbnl.get_results(
    *,
    run: ,
) -> ResultData:

Parameters

Arguments

Description

run

Returns

Type

Description

ResultData

You can only call get_results after the run is closed.

Examples

import dbnl
import pandas as pd
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")

uploaded_data = pd.DataFrame({"error": [0.11, 0.33, 0.52, 0.24]})
run = dbnl.report_run_with_results(
    project=proj,
    column_results=uploaded_data,
)

downloaded_data = dbnl.get_results(run=run)
assert downloaded_data.columns.equals(uploaded_data)

report_column_results

Report all column results to dbnl

dbnl.report_results(
    *,
    run: ,
    data: ,
) -> None:

Parameters

Arguments

Description

Limitations

All data should be reported to dbnl at once. Calling dbnl.report_column_results more than once will overwrite the previously uploaded data.

Once a Run is . You can no longer call report_column_results to send data to dbnl.

Examples

import dbnl
import pandas as pd
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(project=proj1, columns=[{"name": "error", "type": "float"}])
run1 = dbnl.create_run(project=proj1, run_config=runcfg1)

data = pd.DataFrame({"error": [0.11, 0.33, 0.52, 0.24]})
dbnl.report_column_results(run=run1, data=data)

report_scalar_results

Report all scalar results to dbnl

dbnl.report_scalar_results(
    *,
    run: ,
    Union[dict[str, Any], pd.DataFrame]
) -> None:

Parameters

Arguments

Description

Limitations

All data should be reported to dbnl at once. Calling dbnl.report_scalar_results more than once will overwrite the previously uploaded data.

Once a Run is . You can no longer call report_scalar_results to send data to DBNL.

Examples

import dbnl
import pandas as pd
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(
    project=proj1, 
    columns=[{"name": "error", "type": "float"}],
    scalars=[{"name": "rmse": "type": "float"}],
)
run1 = dbnl.create_run(project=proj1, run_config=runcfg1)
dbnl.report_scalar_results(run=run1, data={"rmse": 0.37})

report_results

Report all results to dbnl

dbnl.report_results(
    *,
    run: ,
    column_data: ,
    scalar_data: dict[str, Any] | pandas.DataFrame | None = None
) -> None:

Parameters

Arguments

Description

report_results is the equivalent of calling both report_column_results and report_scalar_results .

Limitations

All data should be reported to dbnl at once. Calling dbnl.report_results more than once will overwrite the previously uploaded data.

Once a Run is . You can no longer call report_results to send data to DBNL.

Examples

import dbnl
import pandas as pd
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(project=proj1, columns=[{"name": "error", "type": "float"}])
run1 = dbnl.create_run(project=proj1, run_config=runcfg1)

data = pd.DataFrame({"error": [0.11, 0.33, 0.52, 0.24]})
dbnl.report_results(run=run1, column_data=data)


import dbnl
import pandas as pd
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(
    project=proj1, 
    columns=[{"name": "error", "type": "float"}],
    scalars=[{"name": "rmse": "type": "float"}],
)
run1 = dbnl.create_run(project=proj1, run_config=runcfg1)
data = pd.DataFrame({"error": [0.11, 0.33, 0.52, 0.24]})
dbnl.report_results(run=run1, column_data=data, scalar_data={"rmse": 0.37})

Run

Functions interacting with dbnl

close_run

Finalize a Run

dbnl.close_run(
    *,
    run: ,
) -> None:

Mark the specified dbnl Run status as completed. Once a Run is marked as closed, it can no longer be used for .

A Run must be closed for all to be shown on the UI.

Parameters

Arguments

Description

Examples

import dbnl
dbnl.login()

dbnl.close_run(run=my_run)

create_run

Create a new dbnl Run

dbnl.create_run(
    *,
    project: ,
    run_config: ,
    display_name: Optional[str] = None,
    metadata: Optional[Dict[str, str]] = None,
) -> :

Parameters

Arguments

Description

Returns

Type

Description

Examples

import dbnl
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(project=proj1, columns=[{"name": "error", "type": "float"}])

run1 = dbnl.create_run(
    project=proj1, 
    run_config=runcfg1, 
    metadata={"mode": "dev"},
)

get_run

Retrieve a dbnl Run

dbnl.get_run(
    *,
    run_id: str,
) -> :

Parameters

Arguments

Description

Returns

Type

Description

Examples

import dbnl
dbnl.login()


proj1 = dbnl.get_or_create_project(name="test_p1")
runcfg1 = dbnl.create_run_config(project=proj1, columns=[{"name": "error", "type": "float"}])
run1 = dbnl.create_run(project=proj1, run_config=runcfg1)

# Retrieving the Run by ID
run2 = dbnl.get_run(run_id=run1.id)
assert run1.id == run2.id

# DBNLRunNotFoundError: A DBNL Run with id run_0000000 does not exist.
run3 = dbnl.get_run(run_id="run_0000000")

report_run_with_results

Create a new Run, report results to it, and close it.

dbnl.report_run_with_results(
    project: ,
    column_data: pd.DataFrame,
    scalar_data: Optional[Union[dict[str, Any], pd.DataFrame]] = None
    display_name: Optional[str] = None,
    row_id: Optional[list[str]] = None,
    run_config_id: Optional[str] = None,
    metadata: Optional[dict[str, str]] = None,
) -> Run:

Parameters

Arguments

Description

Returns

Examples

import dbnl
import pandas as pd
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
test_data = pd.DataFrame({"error": [0.11, 0.33, 0.52, 0.24]})

run = dbnl.report_run_with_results(
    project=proj,
    column_data=test_data,
    row_id=["idx"],
)

Baseline

Functions that interact with dbnl Baseline concept

create_run_query

dbnl.create_run_query(
    *,
    project: Project,
    name: str,
    query: dict[str, Any]
) -> RunQuery:

Parameters

Arguments

Description

project

name

Descriptive name for this Run Query. Must be unique at the Project level

query

dict describing how to find a Run dynamically. Currently, only supports "offset_from_now": int as a key-value pair.

Returns

Type

Description

Example

dbnl.create_run_query(
  project=project,
  name="look back 3",
  query={
    "offset_from_now": 3,
  },
)

get_run_query

Retrieve a dbnl RunQuery with the given name

dbnl.get_run_query(
    project: Project,
    name: str,
) -> RunQuery:

Parameters

Arguments

Description

project

The dbnl this is associated with

name

Name of the Run Query.

Returns

Type

Description

The dbnl RunQuery, typically used for finding a for a Test Session

Example

query = dbnl.get_run_query(
  project=project,
  name="look back 3"
)

set_run_as_baseline

Set a given Run as the Baseline Run in a Project's Test Config

dbnl.set_run_as_baseline(
    *,
    run: ,
) -> None:

Parameters

Arguments

Description

run

The dbnl Run to be set as the Baseline Run in its Project Test Config.

set_run_query_as_baseline

Set a given RunQuery as the Baseline Run in a Project's Test Config

dbnl.set_run_query_as_baseline(
    *,
    run_query: ,
) -> None:

Parameters

Arguments

Description

run_query

The dbnl to be set as the Baseline Run in its Project Test Config.

Test Session

Functions that interact with dbnl TestSession

create_test_session

Create a TestSession

dbnl.create_test_session(
    *,
    experiment_run: Run,
    baseline: Optional[Union[Run, RunQuery]] = None,
    include_tags: Optional[List[str]] = None,
    exclude_tags: Optional[List[str]] = None,
    require_tags: Optional[List[str]] = None,
) -> :

Start evaluating Tests associated with a Run. Typically, the Run you just completed will be the "Experiment" and you'll compare it to some earlier "Baseline Run".

The Run must already have and be closed before a Test Session can begin.

A Run must be closed for all to be shown on the UI.

Parameters

Arguments

Description

Managing Tags

Suppose we have the following Tests with the associated Tags in our Project

Test1 with tags ["A", "B"]
Test2 with tags ["A"]
Test3 with tags ["B"]

dbnl.create_test_session(..., include_tags=["A", "B"]) will trigger Tests 1, 2, 3 to be executed.

dbnl.create_test_session(..., require_tags=["A", "B"]) will only trigger Test 1.

dbnl.create_test_session(..., exclude_tags=["A"]) will trigger Test 3.

dbnl.create_test_session(..., include_tags=["A"], exclude_tags=["B"]) will trigger Test 2.

Examples

Basic example

dbnl.create_test_session(
  experiment_run=new_run,
  baseline=baseline_run,
)

Using a Run Query as a Baseline

dbnl.create_test_session(
  experiment_run=new_run,
  baseline=baseline_run_query,
)

When Baseline Run has already been set

dbnl.create_test_session(
  experiment_run=new_run,
)

Objects

Project

class Project:
    id: str
    name: str
    description: Optional[str] = None

Fields

Argument

Type

Description

Supported Functions

RunConfig

class RunConfig:
    id: str
    project_id: str
    columns: list[ColumnSchema]
    scalars: Optional[list[ScalarSchema]] = None
    description: Optional[str] = None
    display_name: Optional[str] = None
    row_id: Optional[list[str]] = None
    components_dag: Optional[dict[str, list[str]]] = None

Fields

Argument

Type

Description

Supported Functions

Run

class Run:
    id: str
    project_id: str
    run_config_id: str
    display_name: Optional[str] = None
    metadata: Optional[dict[str, str]] = None
    run_config: Optional[RunConfig] = None

Fields

Argument

Description

Supported Functions

RunQuery

class RunQuery(DBNLObject):
    id: str
    project_id: str
    name: str
    query: dict[str, Any]

Fields

Argument

Description

Supported Functions

TestSession

class TestSession(DBNLObject):
    id: str
    project_id: str
    inputs: list[]
    status: Literal["PENDING", "RUNNING", "PASSED", "FAILED"]
    failure: Optional[str] = None
    num_tests_passed: Optional[int] = None
    num_tests_failed: Optional[int] = None
    num_tests_errored: Optional[int] = None
    include_tag_ids: Optional[list[str]] = None
    exclude_tag_ids: Optional[list[str]] = None
    require_tag_ids: Optional[list[str]] = None

TestSessionInput

class TestSessionInput(DBNLObject):
    run_alias: str
    run_id: Optional[str] = None
    run_query_id: Optional[str] = None

Supported Functions

TestRecalibrationSession

class TestRecalibrationSession(DBNLObject):
    id: str
    project_id: str
    test_session_id: str
    feedback: str
    status: Literal["PENDING", "RUNNING", "COMPLETED", "FAILED"]
    test_ids: Optional[list[str]] = None
    failure: Optional[str] = None

Supported Functions

create_test

Create a new Test Spec

dbnl.experimental.create_test(
    *,
    test_spec_dict: Dict[str, Any]
) -> Dict[str, Any]:

Parameters

Arguments

Description

Returns

Type

Description

Test Spec JSON Schema

{
  "project_id": string,

  // Test data
  "name": string (project-unique),
  "description": string?,
  "statistic_name": string,
  "statistic_params": map[string, any],
  "statistic_inputs": list[
    {
      "select_query_template": {
        "select": string // a column or a function on column(s)
      }
    }
  ],
  "assertion": {
    "name": string,
    "params": map[string, any]
  },
  "tag_ids": string[]?
}

Eval Module Functions

create_run_config

Create a new dbnl RunConfig

dbnl.create_run_config(
    *,
    project: ,
    columns: list[dict[str, Any]],
    scalars: Optional[list[dict[str, Any]]] = None,
    description: Optional[str] = None,
    display_name: Optional[str] = None,
    row_id: Optional[list[str]] = None,
    components_dag: Optional[dict[str, list[str]]] = None,
) -> :

Parameters

Arguments

Description

project

The this RunConfig is associated with.

columns

See the section below for more information.

scalars

description

An optional description of the RunConfig, defaults to None. Descriptions are limited to 255 characters.

display_name

An optional display name of the RunConfig, defaults to None. Display names do not have to be unique.

row_id

An optional list of the column names that can be used as unique identifiers, defaults to None.

components_dag

An optional dictionary representing the direct acyclic graph (DAG) of the specified components, defaults to None. Every component listed in the columns schema must be present in the components_dag. Example: components_dag={"fraud-predictor": ["threshold-decision"], 'threshold-decision': []} See the section below for more information.

Column Schema

Column Names

Column names can only be alphanumeric characters and underscores.

Supported Types

The following type supported as type in column schema

type

Notes

float

int

boolean

string

Any arbitrary string values. Raw string type columns do not produce any histogram or scatterplot on the web UI.

category

Equivalent of pandas . Currently only supports category of string values.

list

Currently only supports list of string values. List type columns do not produce any histogram or scatterplot on the web UI.

Components

The optional component key is for specifying the source of the data column in relationship to the AI/ML app subcomponents. Components are used in visualizing the components DAG.

Components DAG

components_dags={
    "TweetSource": ["EntityExtractor", "SentimentClassifier"],
    "EntityExtractor": ["TradeRecommender"],
    "SentimentClassifier": ["TradeRecommender"],
    "TradeRecommender": [],
    "Global": [],
}

Returns

Type

Description

A new dbnl RunConfig

Examples

Basic Usage

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean"},
    ],
    display_name="Basic RunConfig for spam prediction",
)

RunConfig with DAG

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig with DAG
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "component": "data_source", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean", "component": "spam_classifier"},
    ],
    display_name="Basic RunConfig for spam prediction",
    components_dag={
        "data_source": ["spam_classifier"]
        "spam_classifier": []
)

RunConfig with row_id

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean"},
        {"name": "email_id", "type": "string", "description": "unique id for each email"},
    ],
    display_name="Basic RunConfig for spam prediction",
    row_id=["email_id"],
)

RunConfig with scalars

import dbnl
dbnl.login()


proj = dbnl.get_or_create_project(name="test_p1")
# create a new RunConfig
runcfg1 = dbnl.create_run_config(
    project=proj,
    columns=[
        {"name": "error_type", "type": "category"},
        {"name": "email", "type": "string", "description": "raw email text content from source"},
        {"name": "spam-pred", "type": "boolean"},
        {"name": "email_id", "type": "string", "description": "unique id for each email"},
    ],
    scalars=[
        {"name": "model_F1", "type": "float"},
        {"name": "model_recall", "type": "float"},
    ],
    display_name="Basic RunConfig for spam prediction",
)

eval.metrics

Classes and methods in dbnl.eval.metrics.

class dbnl.eval.metrics.Metric

column_schema() → ColumnSchema

Returns the column schema for the metric to be used in a run config.

Returns: _description_
Return type: ColumnSchema

description() → str | None

Returns the description of the metric.

Returns: Description of the metric.

abstract evaluate(df: pd.DataFrame) → pd.Series[Any]

Evaluates the metric over the provided dataframe.

Parameters: df – Input data from which to compute metric.
Returns: Metric values.

abstract expression() → str

Returns the expression representing the metric (e.g. rouge1(prediction, target)).

Returns: Metric expression.

greater_is_better() → bool | None

If true, larger values are assumed to be directionally better than smaller once. If false, smaller values are assumged to be directionally better than larger one. If None, assumes nothing.

Returns: True if greater is better, False if smaller is better, otherwise None.

abstract metric() → str

Returns: Metric name (e.g. rouge1).

abstract name() → str

Returns the fully qualified name of the metric (e.g. rouge1__prediction__target).

Returns: Metric name.

abstract type() → Type

Returns the type of the metric (e.g. float)

Returns: Metric type.

class dbnl.eval.metrics.RougeScoreType(value, 
                                       names=None, *, 
                                       module=None, 
                                       qualname=None, 
                                       type=None, 
                                       start=1, 
                                       boundary=None)

dbnl.eval.metrics.answer_quality_llm_accuracy(input: str, context: str, prediction: str, eval_llm_client: LLMClient) → Metric

Computes the accuracy of the answer by evaluating the accuracy score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_accuracy available in dbnl.eval.metrics.prompts.

Parameters:
- input – input column name
- context – context column name
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: accuracy metric

dbnl.eval.metrics.answer_quality_llm_answer_correctness(input: str, prediction: str, target: str, eval_llm_client: LLMClient) → Metric

Returns the answer correctness metric.

This metric is generated by an LLM using a specific prompt named llm_answer_correctness available in dbnl.eval.metrics.prompts.

Parameters:
- input – input column name
- prediction – prediction column name
- target – target column name
- eval_llm_client – eval LLM client
Returns: answer correctness metric

dbnl.eval.metrics.answer_quality_llm_answer_similarity(input: str, prediction: str, target: str, eval_llm_client: LLMClient) → Metric

Returns answer similarity metric.

This metric is generated by an LLM using a specific prompt named llm_answer_similarity available in dbnl.eval.metrics.prompts.

Parameters:
- input – input column name
- prediction – prediction column name
- target – target column name
- eval_llm_client – eval_llm_client
Returns: answer similarity metric

dbnl.eval.metrics.answer_quality_llm_coherence(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the coherence of the answer by evaluating the coherence score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_coherence available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: coherence metric

dbnl.eval.metrics.answer_quality_llm_commital(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the commital of the answer by evaluating the commital score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_commital available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: commital metric

dbnl.eval.metrics.answer_quality_llm_completeness(input: str, prediction: str, eval_llm_client: LLMClient) → Metrics

Computes the completeness of the answer by evaluating the completeness score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_completeness available in dbnl.eval.metrics.prompts.

Parameters:
- input – input column name
- prediction – prediction column
- eval_llm_client – eval_llm_client
Returns: completeness metric

dbnl.eval.metrics.answer_quality_llm_contextual_relevance(input: str, context: str, eval_llm_client: LLMClient) → Metric

Computes the contextual relevance of the answer by evaluating the contextual relevance score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_contextual_relevance available in dbnl.eval.metrics.prompts.

Parameters:
- input – input column name
- context – context column name
- eval_llm_client – eval_llm_client
Returns: contextual relevance metric

dbnl.eval.metrics.answer_quality_llm_faithfulness(input: str, context: str, prediction: str, eval_llm_client: LLMClient) → Metric

Returns the faithfulness metric.

This metric is generated by an LLM using a specific prompt named llm_faithfulness available in dbnl.eval.metrics.prompts.

Parameters:
- input – input column name
- context – context column name
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: faithfulness metric

dbnl.eval.metrics.answer_quality_llm_grammar_accuracy(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the grammar accuracy of the answer by evaluating the grammar accuracy score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_grammar_accuracy available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: grammar accuracy metric

dbnl.eval.metrics.answer_quality_llm_metrics(input: str | None, prediction: str, context: str | None, target: str | None, eval_llm_client: LLMClient) → list[Metric]

Returns a set of metrics which evaluate the quality of the generated answer. This does not include metrics that require a ground truth.

Parameters:
- input – input column name (i.e. question)
- prediction – prediction column name (i.e. generated answer)
- context – context column name (i.e. document or set of documents retrieved)
- eval_llm_client – eval_llm_client
Returns: list of metrics

dbnl.eval.metrics.answer_quality_llm_originality(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the originality of the answer by evaluating the originality score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_originality available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: originality metric

dbnl.eval.metrics.answer_quality_llm_relevance(input: str, context: str, prediction: str, eval_llm_client: LLMClient) → Metric

Returns relevance metric with context.

This metric is generated by an LLM using a specific prompt named llm_relevance available in dbnl.eval.metrics.prompts.

Parameters:
- input – input column name
- context – context column name
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: answer relevance metric with context

dbnl.eval.metrics.answer_viability_llm_metrics(prediction: str, eval_llm_client: LLMClient) → list[Metric]

Returns a list of metrics relevant for a question and answer task.

Parameters:
- prediction – prediction column name (i.e. generated answer)
- eval_llm_client – eval_llm_client
Returns: list of metrics

dbnl.eval.metrics.answer_viability_llm_reading_complexity(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the reading complexity of the answer by evaluating the reading complexity score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_reading_complexity available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: reading complexity metric

dbnl.eval.metrics.answer_viability_llm_sentiment_assessment(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the sentiment of the answer by evaluating the sentiment assessment score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_sentiment_assessment available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: sentiment assessment metric

dbnl.eval.metrics.answer_viability_llm_text_fluency(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the text fluency of the answer by evaluating the perplexity of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_text_fluency available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: text fluency metric

dbnl.eval.metrics.answer_viability_llm_text_toxicity(prediction: str, eval_llm_client: LLMClient) → Metric

Computes the toxicity of the answer by evaluating the toxicity score of the answer using a language model.

This metric is generated by an LLM using a specific prompt named llm_text_toxicity available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- eval_llm_client – eval_llm_client
Returns: toxicity metric

dbnl.eval.metrics.automated_readability_index(text_col_name: str) → Metric

Returns the Automated Readability Index metric for the text_col_name column.

Calculates the Automated Readability Index (ARI) for a given text. ARI is a readability metric that estimates the U.S. school grade level necessary to understand the text, based on the number of characters per word and words per sentence.

Parameters: text_col_name – text column name
Returns: automated_readability_index metric:

dbnl.eval.metrics.bleu(prediction: str, target: str) → Metric

Returns the bleu metric between the prediction and target columns.

The BLEU score is a metric for evaluating a generated sentence to a reference sentence. The BLEU score is a number between 0 and 1, where 1 means that the generated sentence is identical to the reference sentence.

Parameters:
- prediction – prediction column name
- target – target column name
Returns: bleu metric

dbnl.eval.metrics.character_count(text_col_name: str) → Metric

Returns the character count metric for the text_col_name column.

Parameters: text_col_name – text column name
Returns: character_count metric

dbnl.eval.metrics.context_hit(ground_truth_document_id: str, retrieved_document_ids: str) → Metric

Returns the context hit metric.

This boolean-valued metric is used to evaluate whether the ground truth document is present in the list of retrieved documents. The context hit metric is 1 if the ground truth document is present in the list of retrieved documents, and 0 otherwise.

Parameters:
- ground_truth_document_id – ground_truth_document_id column name
- retrieved_document_ids – retrieved_document_ids column name
Returns: context hit metric

dbnl.eval.metrics.count_metrics(text_col_name: str) → list[Metric]

Returns a set of metrics relevant for a question and answer task.

Parameters: text_col_name – text column name
Returns: list of metrics

dbnl.eval.metrics.flesch_kincaid_grade(text_col_name: str) → Metric

Returns the Flesch-Kincaid Grade metric for the text_col_name column.

Calculates the Flesch-Kincaid Grade Level for a given text. The Flesch-Kincaid Grade Level is a readability metric that estimates the U.S. school grade level required to understand the text. It is based on the average number of syllables per word and words per sentence.

Parameters: text_col_name – text column name
Returns: flesch_kincaid_grade metric

dbnl.eval.metrics.ground_truth_non_llm_answer_metrics(prediction: str, target: str) → list[Metric]

Returns a set of metrics relevant for a question and answer task.

Parameters:
- prediction – prediction column name (i.e. generated answer)
- target – target column name (i.e. expected answer)
Returns: list of metrics

dbnl.eval.metrics.ground_truth_non_llm_retrieval_metrics(ground_truth_document_id: str, retrieved_document_ids: str) → list[Metric]

Returns a set of metrics relevant for a question and answer task.

Parameters:
- ground_truth_document_id – ground_truth_document_id column name
- retrieved_document_ids – retrieved_document_ids column name
Returns: list of metrics

dbnl.eval.metrics.inner_product_retrieval(ground_truth_document_text: str, top_retrieved_document_text: str, eval_embedding_client: EmbeddingClient) → Metric

Returns the inner product metric between the ground_truth_document_text and top_retrieved_document_text columns.

This metric is used to evaluate the similarity between the ground truth document and the top retrieved document using the inner product of their embeddings. The embedding client is used to retrieve the embeddings for the ground truth document and the top retrieved document. An embedding is a high-dimensional vector representation of a string of text.

Parameters:
- ground_truth_document_text – ground_truth_document_text column name
- top_retrieved_document_text – top_retrieved_document_text column name
- embedding_client – embedding client
Returns: inner product metric

dbnl.eval.metrics.inner_product_target_prediction(prediction: str, target: str, eval_embedding_client: EmbeddingClient) → Metric

Returns the inner product metric between the prediction and target columns.

This metric is used to evaluate the similarity between the prediction and target columns using the inner product of their embeddings. The embedding client is used to retrieve the embeddings for the prediction and target columns. An embedding is a high-dimensional vector representation of a string of text.

Parameters:
- prediction – prediction column name
- target – target column name
- embedding_client – embedding client
Returns: inner product metric

dbnl.eval.metrics.levenshtein(prediction: str, target: str) → Metric

Returns the levenshtein metric between the prediction and target columns.

The Levenshtein distance is a metric for evaluating the similarity between two strings. The Levenshtein distance is an integer value, where 0 means that the two strings are identical, and a higher value returns the number of edits required to transform one string into the other.

Parameters:
- prediction – prediction column name
- target – target column name
Returns: levenshtein metric

dbnl.eval.metrics.mrr(ground_truth_document_id: str, retrieved_document_ids: str) → Metric

Returns the mean reciprocal rank (MRR) metric.

This metric is used to evaluate the quality of a ranked list of documents. The MRR score is a number between 0 and 1, where 1 means that the ground truth document is ranked first in the list. The MRR score is calculated by taking the reciprocal of the rank of the first relevant document in the list.

Parameters:
- ground_truth_document_id – ground_truth_document_id column name
- retrieved_document_ids – retrieved_document_ids column name
Returns: mrr metric

dbnl.eval.metrics.non_llm_non_ground_truth_metrics(prediction: str) → list[Metric]

Returns a set of metrics relevant for a question and answer task.

Parameters: prediction – prediction column name (i.e. generated answer)
Returns: list of metrics

dbnl.eval.metrics.question_and_answer_metrics(prediction: str, target: str | None = None, input: str | None = None, context: str | None = None, ground_truth_document_id: str | None = None, retrieved_document_ids: str | None = None, ground_truth_document_text: str | None = None, top_retrieved_document_text: str | None = None, eval_llm_client: LLMClient | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[Metric]

Returns a set of metrics relevant for a question and answer task.

Parameters:
- prediction – prediction column name (i.e. generated answer)
- target – target column name (i.e. expected answer)
- input – input column name (i.e. question)
- context – context column name (i.e. document or set of documents retrieved)
- ground_truth_document_id – ground_truth_document_id containing the information in the target
- retrieved_document_ids – retrieved_document_ids containing the full context
- ground_truth_document_text – text containing the information in the target (ideal is for this to be the top retrieved document)
- top_retrieved_document_text – text of the top retrieved document
- eval_llm_client – eval_llm_client
- eval_embedding_client – eval_embedding_client
Returns: list of metrics

dbnl.eval.metrics.question_and_answer_metrics_extended(prediction: str, target: str | None = None, input: str | None = None, context: str | None = None, ground_truth_document_id: str | None = None, retrieved_document_ids: str | None = None, ground_truth_document_text: str | None = None, top_retrieved_document_text: str | None = None, eval_llm_client: LLMClient | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[Metric]

Returns a set of all metrics relevant for a question and answer task.

Parameters:
- prediction – prediction column name (i.e. generated answer)
- target – target column name (i.e. expected answer)
- input – input column name (i.e. question)
- context – context column name (i.e. document or set of documents retrieved)
- ground_truth_document_id – ground_truth_document_id containing the information in the target
- retrieved_document_ids – retrieved_document_ids containing the full context
- ground_truth_document_text – text containing the information in the target (ideal is for this to be the top retrieved document)
- top_retrieved_document_text – text of the top retrieved document
- eval_llm_client – eval_llm_client
- eval_embedding_client – eval_embedding_client
Returns: list of metrics

dbnl.eval.metrics.text_metrics(prediction: str, target: str | None = None, eval_llm_client: LLMClient | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[Metric]

Returns a set metrics relevant for generic text applications

Parameters:
- prediction – prediction column name (i.e. generated answer)
- target – target column name (i.e. expected answer)
- eval_llm_client – eval_llm_client
- eval_embedding_client – eval_embedding_client
Returns: list of metrics

dbnl.eval.metrics.rouge1(prediction: str, target: str, score_type: RougeScoreType = RougeScoreType.FMEASURE) → Metric

Returns the rouge1 metric between the prediction and target columns.

ROUGE-1 is a recall-oriented metric that calculates the overlap of unigrams (individual words) between the predicted/generated summary and the reference summary. It measures how many single words from the reference summary appear in the predicted summary. ROUGE-1 focuses on basic word-level similarity and is used to evaluate the content coverage.

Parameters:
- prediction – prediction column name
- target – target column name
Returns: rouge1 metric

dbnl.eval.metrics.rouge2(prediction: str, target: str, score_type: RougeScoreType = RougeScoreType.FMEASURE) → Metric

Returns the rouge2 metric between the prediction and target columns.

ROUGE-2 is a recall-oriented metric that calculates the overlap of bigrams (pairs of words) between the predicted/generated summary and the reference summary. It measures how many pairs of words from the reference summary appear in the predicted summary. ROUGE-2 focuses on word-level similarity and is used to evaluate the content coverage.

Parameters:
- prediction – prediction column name
- target – target column name
Returns: rouge2 metric

dbnl.eval.metrics.rougeL(prediction: str, target: str, score_type: RougeScoreType = RougeScoreType.FMEASURE) → Metric

Returns the rougeL metric between the prediction and target columns.

ROUGE-L is a recall-oriented metric based on the Longest Common Subsequence (LCS) between the reference and generated summaries. It measures how well the generated summary captures the longest sequences of words that appear in the same order in the reference summary. This metric accounts for sentence-level structure and coherence.

Parameters:
- prediction – prediction column name
- target – target column name
Returns: rougeL metric

dbnl.eval.metrics.rougeLsum(prediction: str, target: str, score_type: RougeScoreType = RougeScoreType.FMEASURE) → Metric

Returns the rougeLsum metric between the prediction and target columns.

ROUGE-LSum is a variant of ROUGE-L that applies the Longest Common Subsequence (LCS) at the sentence level for summarization tasks. It evaluates how well the generated summary captures the overall sentence structure and important elements of the reference summary by computing the LCS for each sentence in the document.

Parameters:
- prediction – prediction column name
- target – target column name
Returns: rougeLsum metric

dbnl.eval.metrics.rouge_metrics(prediction: str, target: str) → list[Metric]

Returns all rouge metrics between the prediction and target columns.

Parameters:
- prediction – prediction column name
- target – target column name
Returns: list of rouge metrics

dbnl.eval.metrics.sentence_count(text_col_name: str) → Metric

Returns the sentence count metric for the text_col_name column.

Parameters: text_col_name – text column name
Returns: sentence_count metric

dbnl.eval.metrics.summarization_metrics(prediction: str, target: str | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[Metric]

Returns a set of metrics relevant for a summarization task.

Parameters:
- prediction – prediction column name (i.e. generated summary)
- target – target column name (i.e. expected summary)
Returns: list of metrics

dbnl.eval.metrics.token_count(text_col_name: str) → Metric

Returns the token count metric for the text_col_name column.

A token is a sequence of characters that represents a single unit of meaning, such as a word or punctuation mark. The token count metric calculates the total number of tokens in the text. Different languages may have different tokenization rules. This function is implemented using the nltk library.

Parameters: text_col_name – text column name
Returns: token_count metric

dbnl.eval.metrics.word_count(text_col_name: str) → Metric

Returns the word count metric for the text_col_name column.

Parameters: text_col_name – text column name
Returns: word_count metric

dbnl.eval.metrics.quality_llm_text_similarity(prediction: str target: str, eval_llm_client: LLMClient) → Metric

Computes the similarity of the prediction and target text by evaluating using a language model.

This metric is generated by an LLM using a specific specific prompt named llm_accuracy available in dbnl.eval.metrics.prompts.

Parameters:
- prediction – prediction column name
- target - target (expected value) column name
- eval_llm_client – Eval LLM client
Returns: text similarity metric