# Metrics

A Metric is a mapping from [Columns](https://docs.dbnl.com/v0.27.x/configuration/data-pipeline#columns) into meaningful numeric values representing cost, quality, performance, or other behavioral characteristics. Metrics are computed for every ingested log or trace as part of the [DBNL Data Pipeline](https://docs.dbnl.com/v0.27.x/configuration/data-pipeline) and show up in the [Logs](https://docs.dbnl.com/v0.27.x/workflow/logs) view, [Explorer](https://docs.dbnl.com/v0.27.x/workflow/explorer) pages, and [Metrics Dashboard](https://docs.dbnl.com/v0.27.x/dashboards#metrics-dashboard).

DBNL comes with many built in metrics and templates that can be customized. Fundamentally, Metrics are one of two types:

* [**LLM-as-judge Metrics**](#llm-as-judge-metrics): Evals and judges that require an LLM to compute a score or classification based on a prompt.
* [**Standard Metrics**](#standard-metrics): Functions that can be computed using non-LLM methods like traditional Natural Language Processing (NLP) metrics, statistical operations, and other common mapping [functions](https://docs.dbnl.com/v0.27.x/reference/query-language/functions).

### Default Metrics

Every product contains the following metrics by default, computed using the required `input` and `output` fields of the [DBNL Semantic Convention](https://docs.dbnl.com/v0.27.x/configuration/dbnl-semantic-convention) and the default [Model Connection](https://docs.dbnl.com/v0.27.x/configuration/model-connections) for the [Project](https://docs.dbnl.com/v0.27.x/workflow/projects):

* `answer_relevancy`: Determines if the `input` is relevant to the `output`. See [template](https://docs.dbnl.com/v0.27.x/workflow/llm-as-judge-metric-templates#llm_answer_relevancy).
* `user_frustration`: Assesses the level of frustration of the `input` based on tone, word choice, and other properties. See [template](https://docs.dbnl.com/v0.27.x/workflow/llm-as-judge-metric-templates#llm_text_frustration).
* `topic`: Classifies the conversation into a topic based on the `input` and `output`. This Metric is created after topics are automatically generated from the first 7 days of ingested data. Topics can be manually adjusted by editing the [template](https://docs.dbnl.com/v0.27.x/workflow/llm-as-judge-metric-templates#topic).
* `conversation_summary` (immutable): A summary of the `input` and `output`, used as part of `topic` generation.
* `summary_embedding` (immutable): An embedding of the `conversation_summary`, used as part of `topic` generation.

### Creating a Metric

Metrics can be created by clicking on the "+ Create New Metric" button on the Metrics page.

<figure><img src="https://content.gitbook.com/content/8N8zzLtIch6ZiTSwWtXD/blobs/wgP0QXh5uSg1EIXca70Z/image.png" alt=""><figcaption></figcaption></figure>

### When to Create a Metric

Create custom metrics when you need to:

* **Track specific business KPIs**: Cost per conversation, resolution rate, escalation frequency
* **Monitor quality signals**: Response accuracy, hallucination detection, safety violations
* **Measure performance**: Response time, token efficiency, context utilization
* **Validate against requirements**: Brand tone compliance, length constraints, format adherence
* **Debug recurring issues**: Track patterns identified in Insights or Logs exploration

**Good metrics are:**

* **Actionable**: The metric should inform decisions or trigger alerts
* **Measurable**: Clear numeric or categorical output for every log
* **Relevant**: Tied to product quality, user experience, or business outcomes
* **Consistent**: Produces reliable results across similar inputs

{% hint style="info" %}
Start with DBNL's default metrics and templates. Only create custom metrics after you've identified specific signals through the [Explorer](https://docs.dbnl.com/v0.27.x/workflow/explorer) or [Insights](https://docs.dbnl.com/v0.27.x/workflow/insights) that aren't covered by existing metrics.
{% endhint %}

### When to Use Standard vs LLM-as-Judge Metrics

* **Use Standard Metrics when:** You need fast, deterministic calculations (word counts, text length, keyword matching, readability scores)
* **Use LLM-as-Judge Metrics when:** You need semantic understanding (relevance, tone, quality, groundedness)

Standard Metrics are faster and cheaper to compute, so prefer them when possible.

### LLM-as-Judge Metrics

LLM-as-Judge Metrics can be customized from the built in [LLM-as-Judge Metric Templates](https://docs.dbnl.com/v0.27.x/workflow/metrics/llm-as-judge-metric-templates). Each of these Metrics is one of two types:

* Classifier Metric: Outputs a categorical value equal to one of a predefined set of classes. Example: [`llm_answer_groundedness`](https://docs.dbnl.com/v0.27.x/workflow/llm-as-judge-metric-templates#llm_answer_groundedness).
* Scorer Metric: Outputs an integer in the range `[1, 2, 3, 4, 5]`. Example: [`llm_text_frustration`](https://docs.dbnl.com/v0.27.x/workflow/llm-as-judge-metric-templates#llm_text_frustration).

### Standard Metrics

Standard Metrics are functions that can be computed using non-LLM methods. They can be built using the [Functions](https://docs.dbnl.com/v0.27.x/reference/query-language/functions) available in the [DBNL Query Language](https://docs.dbnl.com/v0.27.x/reference/query-language).

#### Creating Standard Metrics

Standard Metrics use query language expressions to compute values from your log columns. Here are common examples:

<details>

<summary>Example 1: Calculate Response Length</summary>

Track the word count of AI responses:

* **Metric Name:** `response_word_count`
* **Type:** Standard Metric
* **Formula:** `word_count(output)`

</details>

<details>

<summary>Example 2: Detect Refusal Keywords</summary>

Identify when the AI refuses to answer:

* **Metric Name:** `contains_refusal`
* **Type:** Standard Metric
* **Formula:** `contains(lower(output), 'sorry') or contains(lower(output), 'cannot') or contains(lower(output), 'unable')`

</details>

<details>

<summary>Example 3: Calculate Input Complexity</summary>

Measure how complex user prompts are:

* **Metric Name:** `input_reading_level`
* **Type:** Standard Metric
* **Formula:** `flesch_kincaid_grade(input)`

</details>

<details>

<summary>Example 4: Detect Question Marks</summary>

Check if input is a question:

* **Metric Name:** `is_question`
* **Type:** Standard Metric
* **Formula:** `contains(input, '?')`

</details>

<details>

<summary>Example 5: Compare String Similarity</summary>

Measure how similar input and output are (useful for detecting parroting):

* **Metric Name:** `input_output_similarity`
* **Type:** Standard Metric
* **Formula:** `1.0 - (levenshtein(input, output) / max(len(input), len(output)))`

</details>

### Troubleshooting Metrics

<details>

<summary>Metric Not Appearing in Logs or Dashboard</summary>

**Possible causes:**

* The metric was created after logs were ingested - metrics only compute for new data after creation
* The pipeline run failed during the Enrich step - check the [Status page](https://docs.dbnl.com/v0.27.x/workflow/status)
* The metric references a column that doesn't exist in your data

**Solution**: Check Status page for errors, verify column names, and wait for the next pipeline run.

</details>

<details>

<summary>LLM-as-Judge Metric Returns Unexpected Values</summary>

**Possible causes:**

* The Model Connection is using a different model than expected
* The evaluation prompt is ambiguous or unclear
* The column placeholders (e.g., `{input}`, `{output}`) are incorrect

**Solution**: Test your Model Connection using the "Validate" button, review example logs to check if columns have expected values, and refine the evaluation prompt for clarity.

</details>

<details>

<summary>Standard Metric Formula Errors</summary>

**Common errors:**

```
# Error: Column doesn't exist
word_count(ouput)  # Typo - should be 'output'

# Error: Wrong function name
wordcount(output)  # Should be 'word_count'

# Error: Type mismatch
word_count(total_token_count)  # Can't count words in a number

# Error: Division by zero
divide(output_tokens, input_tokens)  # Fails if input_tokens is 0
```

**Solution**: Use the [Query Language Functions](https://docs.dbnl.com/v0.27.x/reference/query-language/functions) reference to verify syntax, check column names match your data exactly, and add null/zero checks with conditionals.

</details>

<details>

<summary>Metric Computation is Slow</summary>

**Possible causes:**

* LLM-as-Judge metrics are inherently slower (require Model Connection calls for each log)
* Your Model Connection has high latency or rate limits
* Large log volume

**Solution**: Use Standard Metrics where possible, consider a faster Model Connection (like local NVIDIA NIM), or increase pipeline timeout settings.

</details>

<details>

<summary>Metric Values Are All Null</summary>

**Possible causes:**

* Required columns are missing from your logs
* Formula syntax error causing computation to fail silently
* Model Connection is unreachable or returning errors

**Solution**: Check logs to verify required columns exist, test formula on a small subset, validate Model Connection, and check Status page for pipeline errors.

</details>

{% hint style="info" %}
**Need more help?** Contact <support@distributional.com> or visit [distributional.com/contact](https://distributional.com/contact). Include your metric definition and any error messages from the Status page.
{% endhint %}
