What Is a Similarity Index?
Similarity Index is a single number between 0 and 100 that quantifies how much your application’s behavior has changed between two runs – a Baseline and an Experiment run. It is Distributional’s core signal for measuring application drift, automatically calculated and available in every Test Session.
A lower score indicates a greater behavioral change in your AI application. Each Similarity Index has accompanying Key Insights with a description to help users understand and act on the behavioral drift that Distributional has detected.
Where You’ll See It in the UI
Test Session Summary Page — App-level Similarity Index, results of failed tests, and Key Insights
Similarity Report Tab — Breakdown of Similarity Indexes by column and metric
Column Details View — Histograms and statistical comparison for specific metrics
Tests View — History of Similarity Index-based test pass/fail over time
Why It Matters
When model behavior changes, you need:
A clear signal that drift occurred
An explanation of what changed
A workflow to debug, test, and act
Similarity Index + Key Insights provides all three.
Example:
An app’s Similarity Index drops from 93 → 46
Key Insight: “Answer similarity has decreased sharply”
Metric: levenshtein__generated_answer__expected_answer
Result: Investigate histograms, set test thresholds, adjust model
Hierarchical Structure
Similarity Index operates at three levels:
Application Level — Aggregates all lower-level scores
Column Level — Individual column-level drift
Metric Level — Fine-grained metric change (e.g., readability, latency, BLEU score)
Each level rolls up into the one above it. You can sort by Similarity Index to find the most impacted parts of your app.
Test Sessions and Thresholds
By default, a new DBNL project comes with an Application-level Similarity Index test:
Threshold: ≥ 80
Failure: Indicates meaningful application behavior change
In the UI:
Passed tests are shown in green
Failed tests are shown in red with diagnostic details
All past test runs can be reviewed in the test history.
Key Insights
Key Insights are human-readable interpretations of Similarity Index changes. They answer:
“What changed, and does it matter?”
Each Key Insight includes:
A plain-language summary: “Distribution substantially drifted to the right”
The associated column/metric
The Similarity Index for that metric
Option to add a test on the spot
Example:
Distribution substantially drifted to the right.
→ Metric: levenshtein__generated_answer__expected_answer
→ Similarity Index: 46
→ Add Test
Insights are prioritized and ordered by impact, helping you triage quickly.
Deep Dive: Column Similarity Details
Clicking into a Key Insight opens a detailed view:
Histogram overlays for experiment vs. baseline
Summary statistics (mean, median, percentile, std dev)
Absolute difference of statistics between runs
Links to add similarity or statistical tests on specific metrics
This helps pinpoint whether drift was due to longer answers, slower responses, or changes in generation fidelity.
Frequently Asked Questions
Example Workflow
Run a test session
Similarity Index < 80 → test fails
Review top-level Key Insights
Click into a metric (e.g., levenshtein__generated_answer__expected_answer)
View distribution shift and statistical breakdown
Add targeted test thresholds to monitor ongoing behavior
Adjust model, prompt, or infrastructure as needed
Was this helpful?