LogoLogo
AboutBlogLaunch app ↗
v0.23.x
v0.23.x
  • Get Started
  • Overview
  • Getting Access to Distributional
  • Install the Python SDK
  • Quickstart
  • Learning about Distributional
    • Distributional Concepts
    • Why We Test Data Distributions
    • The Flow of Data
  • Using Distributional
    • Projects
    • Runs
      • Reporting Runs
      • Setting a Baseline Run
    • Metrics
    • Tests
      • Creating Tests
        • Using Filters in Tests
        • Available Statistics and Assertions
      • Running Tests
      • Reviewing Tests
        • What Is a Similarity Index?
    • Notifications
    • Access Controls
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
  • Platform
    • Sandbox
    • Self-hosted
      • Architecture
      • Deployment
        • Helm Chart
        • Terraform Module
      • Networking
      • OIDC Authentication
      • Data Security
  • Reference
    • Query Language
      • Functions
    • Python SDK
      • dbnl
      • dbnl.util
      • dbnl.experimental
      • Classes
      • Eval Module
        • Quick Start
        • dbnl.eval
        • dbnl.eval.metrics
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
      • Classes
  • CLI
  • Versions
    • Release Notes
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page

Was this helpful?

Export as PDF
  1. Learning about Distributional

Distributional Concepts

Understanding key concepts and their role relative to your app

PreviousQuickstartNextWhy We Test Data Distributions

Was this helpful?

Adaptive testing for AI applications requires more information than standard deterministic testing. This is because:

  • AI applications are multi-component systems where changes in one part can affect others in unexpected ways. For instance, a change in your vector database could affect your LLM's responses, or updates to a feature pipeline could impact your machine learning model's predictions.

  • AI applications are non-stationary, meaning their behavior changes over time even if you don't change the code. This happens because the world they interact with changes - new data comes in, language patterns evolve, and third-party models get updated. A test that passes today might fail tomorrow, not because of a bug, but because the underlying conditions have shifted.

  • AI applications are non-deterministic. Even with the exact same input, they might produce different outputs each time. Think of asking an LLM the same question twice - you might get two different, but equally valid, responses. This makes it impossible to write traditional tests that expect exact matches.

To account for this, each time you want to measure the behavior of the AI application, you will need to:

  1. Record outcomes at all of the app’s components, and

  2. Push a distribution of inputs through the app to study behavior across the full spectrum of possible app usage.

The inputs, outputs, and outcomes associated with a single app usage are grouped in a Result, with each value in a result described as a Column. The group of results that are used to measure app behavior is called a Run. To determine if an app is behaving as expected, you create a Test, which involves statistical analysis on one or more runs. When you apply your tests to the runs that you want to study, you create a Test Session, which is a permanent record of the behavior of an app at a given time.

The expected flow of app information to Distributional for run creation within a project
Diagram of the flow of data through DBNL