Data Pipeline

How log data becomes behavioral signals

The Data Pipeline is how DBNL converts raw production AI log data into actionable Insights and Dashboards for each Project and stores it for future analysis within the Data Model.

The Data Pipeline is invoked as production log data is ingested into your DBNL Deployment. This process kicks off at data ingestion if using SDK Log Ingestion and SQL Integration Ingestion and daily at UTC midnight for OTEL Trace Ingestion.

You can inspect the status of Data Pipeline Runs and restart them from the Status page of a Project.

A DBNL Data Pipeline Run performs the following tasks:

  1. Ingest: Raw production log data is flattened into Columns. By using the DBNL Semantic Convention certain Columns can have rich semantic meaning and allow for deeper Insights to be generated.

  2. Enrich: Metrics are computed on the ingested log data, creating Columns in each log line corresponding to each computed Metric.

  3. Analyze: Various unsupervised learning techniques are applied to the enriched log data to discover behavioral signals corresponding to shifts, segments, or outliers in behavior as Insights.

  4. Publish: Insights and updated charts are published to Dashboards for consumption by the user.

Regardless of Ingestion method, the DBNL Data Pipeline ensures that all data is mapped to identical results tables and is treated the same for the purposes of the Analytics Workflow.

Data Model

Columns

A single log represents the captured behavior from a production AI product. Data from each log is flattened into multiple Columns, using the DBNL Semantic Convention whenever possible. The required Columns for a given log are:

  • input: The text input to the LLM.

  • output: The text response from the LLM.

  • timestamp: The UTC timecode associated with the LLM call.

Other semantically known columns can be found in the DBNL Semantic Convention. By using these known column names DBNL can provide better Insights.

Metrics

Metrics are computed from Columns and appended as new Columns for each log.

Segments

Segments represent filters on Columns of Logs.

Was this helpful?