# dbnl.eval.metrics

### *class* dbnl.eval.metrics.Metric

#### column\_schema() → RunSchemaColumnSchemaDict

Returns the column schema for the metric to be used in a run schema.

* **Returns:**\
  \_description\_

#### component() → str | None

#### description() → str | None

Returns the description of the metric.

* **Returns:**\
  Description of the metric.

#### *abstract* evaluate(df: pd.DataFrame) → pd.Series\[Any]

Evaluates the metric over the provided dataframe.

* **Parameters:df** – Input data from which to compute metric.
* **Returns:**\
  Metric values.

#### *abstract* expression() → str

Returns the expression representing the metric (e.g. rouge1(prediction, target)).

* **Returns:**\
  Metric expression.

#### greater\_is\_better() → bool | None

If true, larger values are assumed to be directionally better than smaller once. If false,\
smaller values are assumged to be directionally better than larger one. If None, assumes\
nothing.

* **Returns:**\
  True if greater is better, False if smaller is better, otherwise None.

#### *abstract* inputs() → list\[str]

Returns the input column names required to compute the metric.\
:return: Input column names.

#### *abstract* metric() → str

Returns the metric name (e.g. rouge1).\
:return: Metric name.

#### *abstract* name() → str

Returns the fully qualified name of the metric (e.g. rouge1\_\_prediction\_\_target).

* **Returns:**\
  Metric name.

#### run\_schema\_column() → RunSchemaColumnSchema

Returns the column schema for the metric to be used in a run schema.

* **Returns:**\
  \_description\_

#### *abstract* type() → Literal\['boolean', 'int', 'long', 'float', 'double', 'string', 'category']

Returns the type of the metric (e.g. float)

* **Returns:**\
  Metric type.

### *class* dbnl.eval.metrics.RougeScoreType(value)

An enumeration.

#### FMEASURE *= 'fmeasure'*

#### PRECISION *= 'precision'*

#### RECALL *= 'recall'*

### answer\_quality\_llm\_accuracy

```python
dbnl.eval.metrics.answer_quality_llm_accuracy(input: str, context: str, prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the accuracy of the answer by evaluating the accuracy score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_accuracy available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **input** – input column name
  * **context** – context column name
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  accuracy metric

### answer\_quality\_llm\_alignment\_fidelity

```python
dbnl.eval.metrics.answer_quality_llm_alignment_fidelity(input: str, context: str, prediction: str, target: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the alignment fidelity of the answer by evaluating the alignment fidelity score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_alignment\_fidelity available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **input** – input column name
  * **context** – context column name
  * **prediction** – prediction column name
  * **target** – target column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  alignment fidelity metric

### answer\_quality\_llm\_answer\_fitness

```python
dbnl.eval.metrics.answer_quality_llm_answer_fitness(input: str, context: str, prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the fitness of the answer by evaluating the fitness score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_answer\_fitness available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **input** – input column name
  * **context** – context column name
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  answer fitness metric

### answer\_quality\_llm\_coherence

```python
dbnl.eval.metrics.answer_quality_llm_coherence(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the coherence of the answer by evaluating the coherence score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_coherence available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  coherence metric

### answer\_quality\_llm\_commital

```python
dbnl.eval.metrics.answer_quality_llm_commital(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the commital of the answer by evaluating the commital score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_commital available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  commital metric

### answer\_quality\_llm\_completeness

```python
dbnl.eval.metrics.answer_quality_llm_completeness(input: str, prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the completeness of the answer by evaluating the completeness score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_completeness available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **input** – input column name
  * **prediction** – prediction column
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  completeness metric

### answer\_quality\_llm\_contextual\_relevance

```python
dbnl.eval.metrics.answer_quality_llm_contextual_relevance(input: str, context: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the contextual relevance of the answer by evaluating the contextual relevance score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_contextual\_relevance available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **input** – input column name
  * **context** – context column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  contextual relevance metric

### answer\_quality\_llm\_grammar\_accuracy

```python
dbnl.eval.metrics.answer_quality_llm_grammar_accuracy(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the grammar accuracy of the answer by evaluating the grammar accuracy score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_grammar\_accuracy available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  grammar accuracy metric

### answer\_quality\_llm\_metrics

```python
dbnl.eval.metrics.answer_quality_llm_metrics(*, input: str | None, prediction: str, context: str | None, target: str | None, eval_llm_client: LLMClient) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics which evaluate the quality of the generated answer. This does not include metrics that require a ground truth.

* **Parameters:**
  * **input** – input column name (i.e. question)
  * **prediction** – prediction column name (i.e. generated answer)
  * **context** – context column name (i.e. document or set of documents retrieved)
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  list of metrics

### answer\_quality\_llm\_originality

```python
dbnl.eval.metrics.answer_quality_llm_originality(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the originality of the answer by evaluating the originality score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_originality available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  originality metric

### answer\_viability\_llm\_metrics

```python
dbnl.eval.metrics.answer_viability_llm_metrics(*, prediction: str, eval_llm_client: LLMClient) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a list of metrics relevant for a question and answer task.

* **Parameters:**
  * **prediction** – prediction column name (i.e. generated answer)
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  list of metrics

### answer\_viability\_llm\_reading\_complexity

```python
dbnl.eval.metrics.answer_viability_llm_reading_complexity(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the reading complexity of the answer by evaluating the reading complexity score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_reading\_complexity available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  reading complexity metric

### answer\_viability\_llm\_sentiment\_assessment

```python
dbnl.eval.metrics.answer_viability_llm_sentiment_assessment(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the sentiment of the answer by evaluating the sentiment assessment score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_sentiment\_assessment available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  sentiment assessment metric

### answer\_viability\_llm\_text\_fluency

```python
dbnl.eval.metrics.answer_viability_llm_text_fluency(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the text fluency of the answer by evaluating the perplexity of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_text\_fluency available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  text fluency metric

### answer\_viability\_llm\_text\_toxicity

```python
dbnl.eval.metrics.answer_viability_llm_text_toxicity(prediction: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the toxicity of the answer by evaluating the toxicity score of the answer using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_text\_toxicity available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  toxicity metric

### automated\_readability\_index

```python
dbnl.eval.metrics.automated_readability_index(text_col_name: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the Automated Readability Index metric for the text\_col\_name column.

Calculates the Automated Readability Index (ARI) for a given text. ARI is a readability metric that estimates the U.S. school grade level necessary to understand the text, based on the number of characters per word and words per sentence.

* **Parameters:text\_col\_name** – text column name
* **Returns:**\
  automated\_readability\_index metric

### bleu

```python
dbnl.eval.metrics.bleu(prediction: str, target: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the bleu metric between the prediction and target columns.

The BLEU score is a metric for evaluating a generated sentence to a reference sentence. The BLEU score is a number between 0 and 1, where 1 means that the generated sentence is identical to the reference sentence.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
* **Returns:**\
  bleu metric

### character\_count

```python
dbnl.eval.metrics.character_count(text_col_name: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the character count metric for the text\_col\_name column.

* **Parameters:text\_col\_name** – text column name
* **Returns:**\
  character\_count metric

### context\_hit

```python
dbnl.eval.metrics.context_hit(ground_truth_document_id: str, retrieved_document_ids: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the context hit metric.

This boolean-valued metric is used to evaluate whether the ground truth document is present in the list of retrieved documents. The context hit metric is 1 if the ground truth document is present in the list of retrieved documents, and 0 otherwise.

* **Parameters:**
  * **ground\_truth\_document\_id** – ground\_truth\_document\_id column name
  * **retrieved\_document\_ids** – retrieved\_document\_ids column name
* **Returns:**\
  context hit metric

### count\_metrics

```python
dbnl.eval.metrics.count_metrics(*, text_col_name: str) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics relevant for a question and answer task.

* **Parameters:text\_col\_name** – text column name
* **Returns:**\
  list of metrics

### flesch\_kincaid\_grade

```python
dbnl.eval.metrics.flesch_kincaid_grade(text_col_name: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the Flesch-Kincaid Grade metric for the text\_col\_name column.

Calculates the Flesch-Kincaid Grade Level for a given text. The Flesch-Kincaid Grade Level is a readability metric that estimates the U.S. school grade level required to understand the text. It is based on the average number of syllables per word and words per sentence.

* **Parameters:text\_col\_name** – text column name
* **Returns:**\
  flesch\_kincaid\_grade metric

### ground\_truth\_non\_llm\_answer\_metrics

```python
dbnl.eval.metrics.ground_truth_non_llm_answer_metrics(*, prediction: str, target: str) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics relevant for a question and answer task.

* **Parameters:**
  * **prediction** – prediction column name (i.e. generated answer)
  * **target** – target column name (i.e. expected answer)
* **Returns:**\
  list of metrics

### ground\_truth\_non\_llm\_retrieval\_metrics

```python
dbnl.eval.metrics.ground_truth_non_llm_retrieval_metrics(*, ground_truth_document_id: str, retrieved_document_ids: str) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics relevant for a question and answer task.

* **Parameters:**
  * **ground\_truth\_document\_id** – ground\_truth\_document\_id column name
  * **retrieved\_document\_ids** – retrieved\_document\_ids column name
* **Returns:**\
  list of metrics

### inner\_product\_retrieval

```python
dbnl.eval.metrics.inner_product_retrieval(ground_truth_document_text: str, top_retrieved_document_text: str, eval_embedding_client: EmbeddingClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the inner product metric between the ground\_truth\_document\_text and top\_retrieved\_document\_text columns.

This metric is used to evaluate the similarity between the ground truth document and the top retrieved document using the inner product of their embeddings. The embedding client is used to retrieve the embeddings for the ground truth document and the top retrieved document. An embedding is a high-dimensional vector representation of a string of text.

* **Parameters:**
  * **ground\_truth\_document\_text** – ground\_truth\_document\_text column name
  * **top\_retrieved\_document\_text** – top\_retrieved\_document\_text column name
  * **embedding\_client** – embedding client
* **Returns:**\
  inner product metric

### inner\_product\_target\_prediction

```python
dbnl.eval.metrics.inner_product_target_prediction(prediction: str, target: str, eval_embedding_client: EmbeddingClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the inner product metric between the prediction and target columns.

This metric is used to evaluate the similarity between the prediction and target columns using the inner product of their embeddings. The embedding client is used to retrieve the embeddings for the prediction and target columns. An embedding is a high-dimensional vector representation of a string of text.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
  * **embedding\_client** – embedding client
* **Returns:**\
  inner product metric

### levenshtein

```python
dbnl.eval.metrics.levenshtein(prediction: str, target: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the levenshtein metric between the prediction and target columns.

The Levenshtein distance is a metric for evaluating the similarity between two strings. The Levenshtein distance is an integer value, where 0 means that the two strings are identical, and a higher value returns the number of edits required to transform one string into the other.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
* **Returns:**\
  levenshtein metric

### mrr

```python
dbnl.eval.metrics.mrr(ground_truth_document_id: str, retrieved_document_ids: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the mean reciprocal rank (MRR) metric.

This metric is used to evaluate the quality of a ranked list of documents. The MRR score is a number between 0 and 1, where 1 means that the ground truth document is ranked first in the list. The MRR score is calculated by taking the reciprocal of the rank of the first relevant document in the list.

* **Parameters:**
  * **ground\_truth\_document\_id** – ground\_truth\_document\_id column name
  * **retrieved\_document\_ids** – retrieved\_document\_ids column name
* **Returns:**\
  mrr metric

### non\_llm\_non\_ground\_truth\_metrics

```python
dbnl.eval.metrics.non_llm_non_ground_truth_metrics(*, prediction: str) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics relevant for a question and answer task.

* **Parameters:prediction** – prediction column name (i.e. generated answer)
* **Returns:**\
  list of metrics

### quality\_llm\_text\_similarity

```python
dbnl.eval.metrics.quality_llm_text_similarity(prediction: str, target: str, eval_llm_client: LLMClient) → [Metric](#dbnl.eval.metrics.Metric)
```

Computes the similarty of the prediction and target text by evaluating using a language model.

This metric is generated by an LLM using a specific specific prompt named llm\_text\_similarity available in dbnl.eval.metrics.prompts.

* **Parameters:**
  * **prediction** – prediction column name
  * **eval\_llm\_client** – eval\_llm\_client
* **Returns:**\
  similarity metric

### question\_and\_answer\_metrics

```python
dbnl.eval.metrics.question_and_answer_metrics(*, prediction: str, target: str | None = None, input: str | None = None, context: str | None = None, ground_truth_document_id: str | None = None, retrieved_document_ids: str | None = None, ground_truth_document_text: str | None = None, top_retrieved_document_text: str | None = None, eval_llm_client: LLMClient | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics relevant for a question and answer task.

* **Parameters:**
  * **prediction** – prediction column name (i.e. generated answer)
  * **target** – target column name (i.e. expected answer)
  * **input** – input column name (i.e. question)
  * **context** – context column name (i.e. document or set of documents retrieved)
  * **ground\_truth\_document\_id** – ground\_truth\_document\_id containing the information in the target
  * **retrieved\_document\_ids** – retrieved\_document\_ids containing the full context
  * **ground\_truth\_document\_text** – text containing the information in the target (ideal is for this to be the top retrieved document)
  * **top\_retrieved\_document\_text** – text of the top retrieved document
  * **eval\_llm\_client** – eval\_llm\_client
  * **eval\_embedding\_client** – eval\_embedding\_client
* **Returns:**\
  list of metrics

### question\_and\_answer\_metrics\_extended

```python
dbnl.eval.metrics.question_and_answer_metrics_extended(*, prediction: str, target: str | None = None, input: str | None = None, context: str | None = None, ground_truth_document_id: str | None = None, retrieved_document_ids: str | None = None, ground_truth_document_text: str | None = None, top_retrieved_document_text: str | None = None, eval_llm_client: LLMClient | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of all metrics relevant for a question and answer task.

* **Parameters:**
  * **prediction** – prediction column name (i.e. generated answer)
  * **target** – target column name (i.e. expected answer)
  * **input** – input column name (i.e. question)
  * **context** – context column name (i.e. document or set of documents retrieved)
  * **ground\_truth\_document\_id** – ground\_truth\_document\_id containing the information in the target
  * **retrieved\_document\_ids** – retrieved\_document\_ids containing the full context
  * **ground\_truth\_document\_text** – text containing the information in the target (ideal is for this to be the top retrieved document)
  * **top\_retrieved\_document\_text** – text of the top retrieved document
  * **eval\_llm\_client** – eval\_llm\_client
  * **eval\_embedding\_client** – eval\_embedding\_client
* **Returns:**\
  list of metrics

### rouge1

```python
dbnl.eval.metrics.rouge1(prediction: str, target: str, score_type: [RougeScoreType](#dbnl.eval.metrics.RougeScoreType) = RougeScoreType.FMEASURE) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the rouge1 metric between the prediction and target columns.

ROUGE-1 is a recall-oriented metric that calculates the overlap of unigrams (individual words) between the predicted/generated summary and the reference summary. It measures how many single words from the reference summary appear in the predicted summary. ROUGE-1 focuses on basic word-level similarity and is used to evaluate the content coverage.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
* **Returns:**\
  rouge1 metric

### rouge2

```python
dbnl.eval.metrics.rouge2(prediction: str, target: str, score_type: [RougeScoreType](#dbnl.eval.metrics.RougeScoreType) = RougeScoreType.FMEASURE) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the rouge2 metric between the prediction and target columns.

ROUGE-2 is a recall-oriented metric that calculates the overlap of bigrams (pairs of words) between the predicted/generated summary and the reference summary. It measures how many pairs of words from the reference summary appear in the predicted summary. ROUGE-2 focuses on word-level similarity and is used to evaluate the content coverage.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
* **Returns:**\
  rouge2 metric

### rougeL

```python
dbnl.eval.metrics.rougeL(prediction: str, target: str, score_type: [RougeScoreType](#dbnl.eval.metrics.RougeScoreType) = RougeScoreType.FMEASURE) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the rougeL metric between the prediction and target columns.

ROUGE-L is a recall-oriented metric based on the Longest Common Subsequence (LCS) between the reference and generated summaries. It measures how well the generated summary captures the longest sequences of words that appear in the same order in the reference summary. This metric accounts for sentence-level structure and coherence.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
* **Returns:**\
  rougeL metric

### rougeLsum

```python
dbnl.eval.metrics.rougeLsum(prediction: str, target: str, score_type: [RougeScoreType](#dbnl.eval.metrics.RougeScoreType) = RougeScoreType.FMEASURE) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the rougeLsum metric between the prediction and target columns.

ROUGE-LSum is a variant of ROUGE-L that applies the Longest Common Subsequence (LCS) at the sentence level for summarization tasks. It evaluates how well the generated summary captures the overall sentence structure and important elements of the reference summary by computing the LCS for each sentence in the document.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
* **Returns:**\
  rougeLsum metric

### rouge\_metrics

```python
dbnl.eval.metrics.rouge_metrics(*, prediction: str, target: str) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns all rouge metrics between the prediction and target columns.

* **Parameters:**
  * **prediction** – prediction column name
  * **target** – target column name
* **Returns:**\
  list of rouge metrics

### sentence\_count

```python
dbnl.eval.metrics.sentence_count(text_col_name: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the sentence count metric for the text\_col\_name column.

* **Parameters:text\_col\_name** – text column name
* **Returns:**\
  sentence\_count metric

### summarization\_metrics

```python
dbnl.eval.metrics.summarization_metrics(*, prediction: str, target: str | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics relevant for a summarization task.

* **Parameters:**
  * **prediction** – prediction column name (i.e. generated summary)
  * **target** – target column name (i.e. expected summary)
* **Returns:**\
  list of metrics

### text\_metrics

```python
dbnl.eval.metrics.text_metrics(*, prediction: str, target: str | None = None, eval_llm_client: LLMClient | None = None, eval_embedding_client: EmbeddingClient | None = None) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

Returns a set of metrics relevant for a generic text application

* **Parameters:**
  * **prediction** – prediction column name (i.e. generated text)
  * **target** – target column name (i.e. expected text)
* **Returns:**\
  list of metrics

### text\_monitor\_metrics

```python
dbnl.eval.metrics.text_monitor_metrics(*, columns: list[str], eval_llm_client: LLMClient | None = None) → list[[Metric](#dbnl.eval.metrics.Metric)]
```

### token\_count

```python
dbnl.eval.metrics.token_count(text_col_name: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the token count metric for the text\_col\_name column.

A token is a sequence of characters that represents a single unit of meaning, such as a word or punctuation mark. The token count metric calculates the total number of tokens in the text. Different languages may have different tokenization rules. This function is implemented using the spaCy library.

* **Parameters:text\_col\_name** – text column name
* **Returns:**\
  token\_count metric

### word\_count

```python
dbnl.eval.metrics.word_count(text_col_name: str) → [Metric](#dbnl.eval.metrics.Metric)
```

Returns the word count metric for the text\_col\_name column.

* **Parameters:text\_col\_name** – text column name
* **Returns:**\
  word\_count metric


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.dbnl.com/v0.25.x/reference/python-sdk/dbnl.eval.metrics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
