LLM-as-Judge Metric Templates
Pre-built templates to customize LLM-as-judge Metrics
Custom Metric Templates
Templates for creating entirely new LLM-as-Judge Metrics:
Custom Classifier Metric
Evaluation Prompt:
You are a classifier that classifies the given input according to predefined labels. Carefully read the reasoning for each label, then assign exactly one. Do not include any explanation or extra text.
## Input to be classified:
{your_column_name_here}
## Possible Labels:
<your_label_here>: <your reasoning here>
<your_label_here>: <your reasoning here>Custom Scorer Metric
Evaluation Prompt:
You are an evaluator that assigns a score to the given the input, based on the reasoning defined below.
## Input to be scored:
{your_column_name_here}
## How to score:
<your reasoning here, make sure it only returns a score from [1, 2, 3, 4, 5]>Default Metric Templates
Built in LLM-as-Judge Metrics that can be customized by the user:
topic
Description: Classifies the conversation into a topic based on the
inputandoutput. This Metric is created after topics are automatically generated from the first 7 days of ingested data.Type:
classifyClasses: Topics are automatically generated based on your data
When to Use:
You need to categorize conversations by subject matter for reporting or routing
You want to understand the distribution of topics users are asking about
You need to track trends in specific subject areas over time
You want to segment analysis by conversation topic
Required Columns: input, output
Evaluation Prompt:
The following is a conversation between an AI assistant and a user:
<messages>
<message>user: {input}</message>
<message>assistant: {output}</message>
</messages>
# Task
Your job is to classify the conversation into one of the following topics.
Use both user and assistant messages in your decision.
Carefully consider each topic and choose the most appropriate one.
If you do not think the conversation is about any of the named topics, classify it as "other".
# List of topics
- topic1
- topic2
- topic3llm_answer_groundedness
Description: Judge if the answer is adhering to the context
Type:
classifyInputs:
answercontext
Classes:
grounded,ungroundedPrompt:
You are an expert evaluator of texts properties and characteristics.
Your task is to grade or label the input text or texts based on the provided definition, a detailed set of steps, and a grading rubric.
You must use the grading rubric to assign a score or label.
# Definition
Given a list of Contexts and Answer, groundedness refers to the Answer being consistent with the Contexts.
The Answer either contains information that is supported by the Contexts or assumes information that is available in the Context.
Use a step-by-step thinking process to ensure high-quality consideration of the grading criteria before reaching the conclusion.
# Steps
1. Analyze the content of the Answer and the Contexts.
2. Determine if the Answer contains false information or makes assumptions not supported by the Contexts.
3. Categorize the alignment of the Answer with the Contexts as one of the following grades: grounded if the Answer is consistent with the Contexts, ungrounded otherwise.
# Grading Criteria
- grounded: The Answer is grounded in the given contexts.
- ungrounded: The Answer is not grounded in the given contexts.
# Output Format
Only output the final evaluation score or label. Do not reveal the reasoning steps or any intermediate thoughts.
The response should be a valid JSON object with at least the following fields: "output".
The output format for the value should be a string that is one of the following classes: grounded, ungrounded.
# Examples
**Input**
Context: Paris is the capital and the largest city in France.
Answer: The capital of France is Paris.
** Internal Reasoning **
The Answer is consistent with the Context. Paris is the capital of France.
**Output**
{
"output": "grounded"
}
**Input**
Context: The Denver Nuggets defeated the Miami Heat in five games, winning the NBA championship in 2023.
Answer: Joel Embiid was voted MVP of the NBA in 2023.
** Internal Reasoning **
The Answer is not consistent with the Context. The Context does state any information of Joel Embiid being MVP of the NBA in 2023.
**Output**
{
"output": "ungrounded"
}
# Notes
- Always aim to provide a fair and balanced assessment.
- Consider both explicit statements and implicit tone.
- Consistency in labeling similar messages is crucial.
- Ensure the reasoning clearly justifies the assigned label based on the steps taken.
Context: {context}
Answer: {output}
llm_answer_refusal
Description: Judge if the answer is a refusal to answer the question
Type:
classifyInputs:
answer
Classes:
refused,not_refusedPrompt:
llm_answer_relevancy
Description: Judge if the answer is relevant to the question
Type:
classifyInputs:
questionanswer
Classes:
relevant,irrelevantPrompt:
llm_context_relevancy
Description: LLM as Judge if the contexts are relevant to the question
Type:
classifyInputs:
questioncontext
Classes:
relevant,irrelevantPrompt:
llm_summarization
Description: Summarize the input and output of a conversational system.
Type:
textInputs:
inputoutput
Prompt:
llm_text_frustration
Description: Judge the frustration of text (default to input) on a scale of 1 to 5.
Type:
scoreInputs:
text
Prompt:
llm_text_sentiment
Description: Judge the sentiment of a text as positive, negative, or neutral.
Type:
classifyInputs:
text
Classes:
negative,neutral,positivePrompt:
Was this helpful?

