LogoLogo
AboutBlogLaunch app ↗
v0.23.x
v0.23.x
  • Get Started
  • Overview
  • Getting Access to Distributional
  • Install the Python SDK
  • Quickstart
  • Learning about Distributional
    • Distributional Concepts
    • Why We Test Data Distributions
    • The Flow of Data
  • Using Distributional
    • Projects
    • Runs
      • Reporting Runs
      • Setting a Baseline Run
    • Metrics
    • Tests
      • Creating Tests
        • Using Filters in Tests
        • Available Statistics and Assertions
      • Running Tests
      • Reviewing Tests
        • What Is a Similarity Index?
    • Notifications
    • Access Controls
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
  • Platform
    • Sandbox
    • Self-hosted
      • Architecture
      • Deployment
        • Helm Chart
        • Terraform Module
      • Networking
      • OIDC Authentication
      • Data Security
  • Reference
    • Query Language
      • Functions
    • Python SDK
      • dbnl
      • dbnl.util
      • dbnl.experimental
      • Classes
      • Eval Module
        • Quick Start
        • dbnl.eval
        • dbnl.eval.metrics
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
      • Classes
  • CLI
  • Versions
    • Release Notes
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • Supported LLM and model services
  • OpenAI
  • Azure OpenAI
  • TogetherAI (or other OpenAI compatible service / endpoints)
  • Missing Metric Values
  • Throughput and Rate Limits

Was this helpful?

Export as PDF
  1. Reference
  2. Python SDK
  3. Eval Module

LLM-as-judge and Embedding Metrics

PreviousHow-To / FAQNextRAG / Question Answer Example

Was this helpful?

A common strategy for evaluating unstructured text application is to use other LLMs and text embedding models to drive metrics of interest.

Supported LLM and model services

The LLM-as-judge in dbnl.eval support OpenAI, Azure OpenAI and any other third-party LLM / embedding model provider that is compatible with the OpenAI python client. Specifically, third-party endpoints should (mostly) adhere to the schema of:

  • endpoint for LLMs

  • endpoint for embedding models

The following examples show how to initialize an llm_eval_client and an eval_embedding_client under different providers.

OpenAI

from openai import OpenAI
from dbnl.eval.llm import OpenAILLMClient
from dbnl.eval.embedding_clients import OpenAIEmbeddingClient

# create client for LLM-as-judge metrics
base_oai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
eval_llm_client = OpenAILLMClient.from_existing_client(
    base_oai_client, llm_model="gpt-3.5-turbo-0125"
)

embd_client = OpenAIEmbeddingClient.from_existing_client(
    base_oai_client, embedding_model="text-embedding-ada-002"
)

Azure OpenAI

from openai import AzureOpenAI
from dbnl.eval.llm import AzureOpenAILLMClient
from dbnl.eval.embedding_clients import AzureOpenAIEmbeddingClient

base_azure_oai_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["OPENAI_API_VERSION"], # eg 2023-12-01-preview
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"] # eg https://resource-name.openai.azure.com
)
eval_llm_client = AzureOpenAILLMClient.from_existing_client(
    base_azure_oai_client, llm_model="gpt-35-turbo-16k"
)
embd_client = AzureOpenAIEmbeddingClient.from_existing_client(
    base_azure_oai_client, embedding_model="text-embedding-ada-002"
)

TogetherAI (or other OpenAI compatible service / endpoints)

from openai import OpenAI
from dbnl.eval.llm import OpenAILLMClient
base_oai_client = OpenAI(
    api_key=os.environ["TOGETHERAI_API_KEY"],
    base_url="https://api.together.xyz/v1",
)

eval_llm_client = OpenAILLMClient.from_existing_client(
    base_oai_client, llm_model='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo'
)

Missing Metric Values

It is possible for some of the LLM-as-judge metrics to occasionally return values that are unable to be parsed. These metrics values will surface as None

Distributional is able to accept dataframes including None values. The platform will intelligently filter them when applicable.

Throughput and Rate Limits

LLM service providers often impose request rate limits and token throughput caps. Some example errors that one might encounter are shown below:

{'code': '429', 'message': 'Requests to the Embeddings_Create Operation under 
  Azure OpenAI API version XXXX have exceeded call rate limit of your current 
  OpenAI pricing tier. Please retry after 86400 seconds. 
  Please go here: https://aka.ms/oai/quotaincrease if you would 
  like to further increase the default rate limit.'}
{'message': 'You have been rate limited. Your rate limit is YYY queries per
minute. Please navigate to https://www.together.ai/forms/rate-limit-increase 
to request a rate limit increase.', 'type': 'credit_limit', 
'param': None, 'code': None}
{'message': 'Rate limit reached for gpt-4 in organization XXXX on 
tokens per min (TPM): Limit WWWWW, Used YYYY, Requested ZZZZ. 
Please try again in 1.866s. Visit https://platform.openai.com/account/rate-limits 
to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}

In the event you experience these errors, please work with your LLM service provider to adjust your limits. Additionally, feel free to reach out to Distributional support with the issue you are seeing.

text metrics
v1/chat/completions
v1/embeddings