Metrics
Codify signals to track behavior that matters
A Metric is a mapping from Columns into meaningful numeric values representing cost, quality, performance, or other behavioral characteristics. Metrics are computed for every ingested log or trace as part of the DBNL Data Pipeline and show up in the Logs view, Explorer pages, and Metrics Dashboard.
DBNL comes with many built in metrics and templates that can be customized. Fundamentally, Metrics are one of two types:
LLM-as-judge Metrics: Evals and judges that require an LLM to compute a score or classification based on a prompt.
Standard Metrics: Functions that can be computed using non-LLM methods like traditional Natural Language Processing (NLP) metrics, statistical operations, and other common mapping functions.
Default Metrics
Every product contains the following metrics by default, computed using the required input and output fields of the DBNL Semantic Convention and the default Model Connection for the Project:
answer_relevancy: Determines if theinputis relevant to theoutput. See template.user_frustration: Assesses the level of frustration of theinputbased on tone, word choice, and other properties. See template.topic: Classifies the conversation into a topic based on theinputandoutput. This Metric is created after topics are automatically generated from the first 7 days of ingested data. Topics can be manually adjusted by editing the template.conversation_summary(immutable): A summary of theinputandoutput, used as part oftopicgeneration.summary_embedding(immutable): An embedding of theconversation_summary, used as part oftopicgeneration.
Creating a Metric
Metrics can be created by clicking on the "+ Create New Metric" button on the Metrics page.

When to Create a Metric
Create custom metrics when you need to:
Track specific business KPIs: Cost per conversation, resolution rate, escalation frequency
Monitor quality signals: Response accuracy, hallucination detection, safety violations
Measure performance: Response time, token efficiency, context utilization
Validate against requirements: Brand tone compliance, length constraints, format adherence
Debug recurring issues: Track patterns identified in Insights or Logs exploration
Good metrics are:
Actionable: The metric should inform decisions or trigger alerts
Measurable: Clear numeric or categorical output for every log
Relevant: Tied to product quality, user experience, or business outcomes
Consistent: Produces reliable results across similar inputs
When to Use Standard vs LLM-as-Judge Metrics
Use Standard Metrics when: You need fast, deterministic calculations (word counts, text length, keyword matching, readability scores)
Use LLM-as-Judge Metrics when: You need semantic understanding (relevance, tone, quality, groundedness)
Standard Metrics are faster and cheaper to compute, so prefer them when possible.
LLM-as-Judge Metrics
LLM-as-Judge Metrics can be customized from the built in LLM-as-Judge Metric Templates. Each of these Metrics is one of two types:
Classifier Metric: Outputs a categorical value equal to one of a predefined set of classes. Example:
llm_answer_groundedness.Scorer Metric: Outputs an integer in the range
[1, 2, 3, 4, 5]. Example:llm_text_frustration.
Standard Metrics
Standard Metrics are functions that can be computed using non-LLM methods. They can be built using the Functions available in the DBNL Query Language.
Creating Standard Metrics
Standard Metrics use query language expressions to compute values from your log columns. Here are common examples:
Troubleshooting Metrics
Was this helpful?

