How-To / FAQ
What if I do not have an LLM service to run LLM-as-judge metrics?
No problem, just don’t include an eval_llm_client or an eval_embedding_client argument in the call(s) to the evaluation helpers. The helpers will automatically exclude any metrics that depend on them.
# BEFORE : default text metrics including those requiring target (ground_truth) and LLM-as-judge
text_metrics = dbnl.eval.metrics.text_metrics(
prediction="prediction", target="ground_truth", eval_llm_client=oai_client
)
# AFTER : remove the eval_llm_client to exclude LLM-as-judge metrics
text_metrics = dbnl.eval.metrics.text_metrics(
prediction="prediction", target="ground_truth"
)
aug_eval_df = evaluate(eval_df, text_metrics)What if I do not have ground-truth available?
No problem. You can simply remove the target argument from the helper. The metric set helper will automatically exclude any metrics that depend on the target column being specified.
# BEFORE : default text metrics, including those requiring target (ground_truth) and LLM-as-judge
text_metrics = dbnl.eval.metrics.text_metrics(
prediction="prediction", target="ground_truth", eval_llm_client=oai_client
)
# AFTER : remove the target to remove metrics that depend on that value being specified
text_metrics = dbnl.eval.metrics.text_metrics(
prediction="prediction", eval_llm_client=oai_client
)
aug_eval_df = evaluate(eval_df, text_metrics)There is an additional helper that can generate a list of generic metrics appropriate for “monitoring” unstructured text columns : text_monitor_metrics(). Simply provide a list of text column names and optionally an eval_llm_client for LLM-as-judge metrics.
How do I create a custom LLM-as-judge metric?
You can write your own LLM-as-judge metric that uses your custom prompt. The example below defines a custom LLM-as-judge metric and runs it on an example dataframe.
You can also write a metric that includes only the prediction column specified and reference only {prediction} in the custom prompt. An example is below:
Was this helpful?

