# Quick Start

To use `dbnl.eval`, you will need to install the extra 'eval' package as described in [these instructions](https://docs.dbnl.com/v0.19.x/getting-started#installing-distributional).

1. Create a client to power LLM-as-judge text metrics \[optional]
2. Generate a list of metrics suitable for comparing text\_A to reference text\_B
3. Use `dbnl.eval` to evaluate to compute the list metrics.
4. Publish the augmented dataframe and new metric quantities to DBNL

```python
import dbnl
import os
import pandas as pd
from openai import OpenAI
from dbnl.eval.llm import OpenAILLMClient
from dbnl.eval import evaluate

# 1. create client to power LLM-as-judge metrics
base_oai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
oai_client = OpenAILLMClient.from_existing_client(base_oai_client, llm_model="gpt-3.5-turbo-0125")

eval_df = pd.DataFrame(
    [
        { "prediction":"France has no capital",
          "ground_truth": "The capital of France is Paris",},
        { "prediction":"The capital of France is Toronto",
          "ground_truth": "The capital of France is Paris",},
        { "prediction":"Paris is the capital",
          "ground_truth": "The capital of France is Paris",},
    ] * 4
)

# 2. get text metrics that use target (ground_truth) and LLM-as-judge metrics
text_metrics = dbnl.eval.metrics.text_metrics(
    prediction="prediction", target="ground_truth", eval_llm_client=oai_client
)
# 3. run text metrics that use target (ground_truth) and LLM-as-judge metrics
aug_eval_df = evaluate(eval_df, text_metrics)

# 4. publish to DBNL
dbnl.login(api_token=os.environ["DBNL_API_TOKEN"])
project = dbnl.get_or_create_project(name="DEAL_testing")
cols = dbnl.experimental.get_column_schemas_from_dataframe(aug_eval_df)
run_config = dbnl.create_run_config(project=project, columns=cols)
run = dbnl.create_run(project=project, run_config=run_config)
dbnl.report_results(run=run, data=aug_eval_df)
dbnl.close_run(run=run)
```

You can inspect a subset of the the `aug_eval_df` rows and for example, one of the columns created by one of the metrics in the `text_metrics` list :  `llm_text_similarity_v0`

<table><thead><tr><th width="71">idx</th><th width="225">prediction</th><th width="251">ground_truth</th><th>llm_text_similarity_v0__prediction__ground_truth</th></tr></thead><tbody><tr><td>0</td><td>France has no capital</td><td>The capital of France is Paris</td><td>1</td></tr><tr><td>1</td><td>The capital of France is Toronto</td><td>The capital of France is Paris</td><td>1</td></tr><tr><td>2</td><td>Paris is the capital</td><td>The capital of France is Paris</td><td>5</td></tr></tbody></table>

The values of `llm_text_similarity_v0`qualitatively match our expectations on semantic similarity between the prediction and ground\_truth

The call to [`evaluate()`](https://docs.dbnl.com/v0.19.x/using-distributional/python-sdk/eval-module-functions/eval#dbnl.eval.evaluate-df-dataframe-metrics-sequence-metric-inplace-bool-false-dataframe) takes a dataframe and metric list as input and returns a dataframe with extra columns. Each new column holds the value of a metric computation for that row

```python
def evaluate(df: pd.DataFrame, metrics: Sequence[Metric], inplace: bool = False) -> pd.DataFrame:
    """
    Evaluates a set of metrics on a dataframe, returning an augmented dataframe.

    :param df: input dataframe
    :param metrics: metrics to compute
    :param inplace: whether to modify the input dataframe in place
    :return: input dataframe augmented with metrics
    """
```

The column names of the metrics in the returned dataframe include the metric name and the columns that were used in that metrics computation

\
For example the metric named `llm_text_similarity_v0` becomes `llm_text_similarity_v0__prediction__ground_truth` because it takes as input both the column named `prediction` and the column named `ground_truth`
