LogoLogo
AboutBlogLaunch app ↗
v0.20.x
v0.20.x
  • Introduction to AI Testing
  • Welcome to Distributional
  • Motivation
  • What is AI Testing?
  • Stages in the AI Software Development Lifecycle
    • Components of AI Testing
  • Distributional Testing
  • Getting Access to Distributional
  • Learning about Distributional
    • The Distributional Framework
    • Defining Tests in Distributional
      • Automated Production test creation & execution
      • Knowledge-based test creation
      • Comprehensive testing with Distributional
    • Reviewing Test Sessions and Runs in Distributional
      • Reviewing and recalibrating automated Production tests
      • Insights surfaced elsewhere on Distributional
      • Notifications
    • Data in Distributional
      • The flow of data
      • Components and the DAG for root cause analysis
      • Uploading data to Distributional
      • Living in your VPC
  • Using Distributional
    • Getting Started
    • Access
      • Organization and Namespaces
      • Users and Permissions
      • Tokens
    • Data
      • Data Objects
      • Run-Level Data
      • Data Storage Integrations
      • Data Access Controls
    • Testing
      • Creating Tests
        • Test Page
        • Test Drawer Through Shortcuts
        • Test Templates
        • SDK
      • Defining Assertions
      • Production Testing
        • Auto-Test Generation
        • Recalibration
        • Notable Results
        • Dynamic Baseline
      • Testing Strategies
        • Test That a Given Distribution Has Certain Properties
        • Test That Distributions Have the Same Statistics
        • Test That Columns Are Similarly Distributed
        • Test That Specific Results Have Matching Behavior
        • Test That Distributions Are Not the Same
      • Executing Tests
        • Manually Running Tests Via UI
        • Executing Tests Via SDK
      • Reviewing Tests
      • Using Filters
        • Filters in the Compare Page
        • Filters in Tests
    • Python SDK
      • Quick Start
      • Functions
        • login
        • Project
          • create_project
          • copy_project
          • export_project_as_json
          • get_project
          • get_or_create_project
          • import_project_from_json
        • Run Config
          • create_run_config
          • get_latest_run_config
          • get_run_config
          • get_run_config_from_latest_run
        • Run Results
          • get_column_results
          • get_scalar_results
          • get_results
          • report_column_results
          • report_scalar_results
          • report_results
        • Run
          • close_run
          • create_run
          • get_run
          • report_run_with_results
        • Baseline
          • create_run_query
          • get_run_query
          • set_run_as_baseline
          • set_run_query_as_baseline
        • Test Session
          • create_test_session
      • Objects
        • Project
        • RunConfig
        • Run
        • RunQuery
        • TestSession
        • TestRecalibrationSession
        • TestGenerationSession
        • ResultData
      • Experimental Functions
        • create_test
        • get_tests
        • get_test_sessions
        • wait_for_test_session
        • get_or_create_tag
        • prepare_incomplete_test_spec_payload
        • create_test_recalibration_session
        • wait_for_test_recalibration_session
        • create_test_generation_session
        • wait_for_test_generation_session
      • Eval Module
        • Quick Start
        • Application Metric Sets
        • How-To / FAQ
        • LLM-as-judge and Embedding Metrics
        • RAG / Question Answer Example
        • Eval Module Functions
          • Index of functions
          • eval
          • eval.metrics
    • Notifications
    • Release Notes
  • Tutorials
    • Instructions
    • Hello World (Sentiment Classifier)
    • Trading Strategy
    • LLM Text Summarization
      • Setting the Scene
      • Prompt Engineering
      • Integration testing for text summarization
      • Practical considerations
Powered by GitBook

© 2025 Distributional, Inc. All Rights Reserved.

On this page
  • Problem
  • Solution

Was this helpful?

Export as PDF

Motivation

Why AI Testing Matters

Problem

Unlike traditional software applications, AI applications exhibit continuously evolving behavior due to factors such as updates to language models, shifts in context, data drift, and changes in prompt engineering or system integrations. Changing behavior is of great concern to almost any organization building these applications, and is something that they constantly need to address.

If not addressed it is hard for organizations to:

Defining Desired Behavior

Organizations struggle to define what "good" looks like for their AI applications. Current metrics fail to capture the full scope of desired behavior because:

  • Context Matters – AI behavior differs across applications, use cases, and environments.

  • Metrics Have Limits – No single set of measurements can fully capture performance across all dimensions.

  • Evolving Requirements – Business needs, regulations, and user expectations shift over time, requiring continuous refinement.

Understanding and Addressing Changes

When applications changes from expected behavior, organizations struggle to:

  • Pinpointing Root Causes – Identifying why behavior changed can be complex.

  • Adapting vs. Fixing – Deciding whether to refine behavior definitions or modify the application itself.

  • Prioritizing Solutions – Knowing where to start when addressing issues.

  • Ensuring Effectiveness – Validating that changes lead to meaningful improvements.

The uncertainty of AI applications often prevents AI applications from seeing the light of day. For those that reach production, teams lack the tools to effectively monitor and maintain their performance.


Solution

Distributional helps organizations understand and track how your AI application behaves in two main ways:

First, it watches your application's inputs and outputs over time, building a picture of “expected” behavior. Think of it like your AI applications fingerprint - when your applications fingerprint changes Distributional notices and alerts you. If something goes wrong, it shows you exactly what changed and helps you figure out why, making it easier to fix problems quickly.

Secondly, the system keeps track of everything it learns about your application's behavior, saving these insights for future use. Organizations can apply what they learn from one AI application to similar ones, helping them test and improve new applications more quickly. This creates a consistent way to test AI applications across an organization and ensures that your AI application reaches maximum up-time.

PreviousWelcome to DistributionalNextWhat is AI Testing?

Was this helpful?