Quick Start
To use dbnl.eval
, you will need to install the extra 'eval' package as described in these instructions.
Create a client to power LLM-as-judge text metrics [optional]
Generate a list of metrics suitable for comparing text_A to reference text_B
Use
dbnl.eval
to evaluate to compute the list metrics.Publish the augmented dataframe and new metric quantities to DBNL
You can inspect a subset of the the aug_eval_df
rows and for example, one of the columns created by one of the metrics in the text_metrics
list : llm_text_similarity_v0
0
France has no capital
The capital of France is Paris
1
1
The capital of France is Toronto
The capital of France is Paris
1
2
Paris is the capital
The capital of France is Paris
5
The values of llm_text_similarity_v0
qualitatively match our expectations on semantic similarity between the prediction and ground_truth
The call to evaluate()
takes a dataframe and metric list as input and returns a dataframe with extra columns. Each new column holds the value of a metric computation for that row
The column names of the metrics in the returned dataframe include the metric name and the columns that were used in that metrics computation
For example the metric named llm_text_similarity_v0
becomes llm_text_similarity_v0__prediction__ground_truth
because it takes as input both the column named prediction
and the column named ground_truth
Last updated
Was this helpful?