LLM-as-Judge Metric Templates
Pre-built templates to customize LLM-as-judge Metrics
Custom Metric Templates
Templates for creating entirely new LLM-as-Judge Metrics:
Custom Classifier Metric
Evaluation Prompt:
You are a classifier that classifies the given input according to predefined labels. Carefully read the reasoning for each label, then assign exactly one. Do not include any explanation or extra text.
## Input to be classified:
{your_column_name_here}
## Possible Labels:
<your_label_here>: <your reasoning here>
<your_label_here>: <your reasoning here>Custom Scorer Metric
Evaluation Prompt:
You are an evaluator that assigns a score to the given the input, based on the reasoning defined below.
## Input to be scored:
{your_column_name_here}
## How to score:
<your reasoning here, make sure it only returns a score from [1, 2, 3, 4, 5]>Default Metric Templates
Built in LLM-as-Judge Metrics that can be customized by the user:
topic
Description: Classifies the conversation into a topic based on the
inputandoutput. This Metric is created after topics are automatically generated from the first 7 days of ingested data.Type: Classifier
sages is crucial.
The following is a conversation between an AI assistant and a user:
<messages>
<message>user: {input}</message>
<message>assistant: {output}</message>
</messages>
# Task
Your job is to classify the conversation into one of the following topics.
Use both user and assistant messages in your decision.
Carefully consider each topic and choose the most appropriate one.
If you do not think the conversation is about any of the named topics, classify it as "other".
# List of topics
- plan romantic and active day outings in various cities
- recommend nearby locations for a streetcar, bus, or bike trip
- plan an itinerary for a bar foodie crawl
- plan a customized day trip for leisure activities
- plan bike routes with stops and transportation
- plan a museum-hopping itinerary in multiple cities
- plan a family outing to visit various locations using public transportation and walking
- assist in planning a day trip or romantic evening out in a city, providing recommendations for various attractions, transportation options, and reservation links
- provide nearby dining options
- provide dining or nightlife recommendations based on location and user preferences
- otherllm_answer_groundedness
Description: Given a list of Contexts and Answer, groundedness refers to the Answer being consistent with the Contexts.
Type: Classifier
Classes:
grounded,not_grounded
Evaluation Prompt:
You are an expert evaluator of texts properties and characteristics.
Your task is to grade or label the input text or texts based on the provided definition, a detailed set of steps, and a grading rubric.
You must use the grading rubric to assign a score or label.
# Definition
Given a list of Contexts and Answer, groundedness refers to the Answer being consistent with the Contexts.
The Answer either contains information that is supported by the Contexts or assumes information that is available in the Context.
Use a step-by-step thinking process to ensure high-quality consideration of the grading criteria before reaching the conclusion.
# Steps
1. Analyze the content of the Answer and the Contexts.
2. Determine if the Answer contains false information or makes assumptions not supported by the Contexts.
3. Categorize the alignment of the Answer with the Contexts as one of the following grades: grounded if the Answer is consistent with the Contexts, ungrounded otherwise.
# Grading Criteria
- grounded: The Answer is grounded in the given contexts.
- ungrounded: The Answer is not grounded in the given contexts.
# Output Format
Only output the final evaluation score or label. Do not reveal the reasoning steps or any intermediate thoughts.
The response should be a valid JSON object with at least the following fields: "output".
The output format for the value should be a string that is one of the following classes: grounded, ungrounded.
# Examples
**Input**
Context: Paris is the capital and the largest city in France.
Answer: The capital of France is Paris.
** Internal Reasoning **
The Answer is consistent with the Context. Paris is the capital of France.
**Output**
{
"output": "grounded"
}
**Input**
Context: The Denver Nuggets defeated the Miami Heat in five games, winning the NBA championship in 2023.
Answer: Joel Embiid was voted MVP of the NBA in 2023.
** Internal Reasoning **
The Answer is not consistent with the Context. The Context does state any information of Joel Embiid being MVP of the NBA in 2023.
**Output**
{
"output": "ungrounded"
}
# Notes
- Always aim to provide a fair and balanced assessment.
- Consider both explicit statements and implicit tone.
- Consistency in labeling similar messages is crucial.
- Ensure the reasoning clearly justifies the assigned label based on the steps taken.
Context: {context}
Answer: {output}
llm_answer_refusal
Description: Classify whether the response from a QA system refused to answer the question.
Type: Classifier
Classes:
refused,not_refused
Evaluation Prompt:
llm_answer_relevancy
Description: Given a Question and an Answer, determine if the Answer is relevant to the Question.
Type: Classifier
Classes:
relevant,irrelevant
Evaluation Prompt:
llm_context_relevancy
Description: Context relevancy is evaluated based on the relevance of the provided list of Contexts to the user's Query.
Type: Classifier
Classes:
relevant,irrelevant
Evaluation Prompt:
llm_question_clarity
Description: Context relevancy is evaluated based on the relevance of the provided list of Contexts to the user's Query.
Type: Scorer
Range:
[1, 2, 3, 4, 5]
Evaluation Prompt:
llm_text_frustration
Description: Assess the level of frustration in the input on a scale of 1 to 5.
Type: Scorer
Range:
[1, 2, 3, 4, 5]
Evaluation Prompt:
llm_text_sentiment
Description: Determine whether the tone of the message is negative, neutral, or positive based on the content and context of the message provided.
Type: Classifier
Classes:
negative,neutral,positive
Evaluation Prompt:
Was this helpful?

