# Data Pipeline

The Data Pipeline is how DBNL converts raw production AI log data into actionable [Insights](https://docs.dbnl.com/v0.29.x/workflow/insights) and [Dashboards](https://docs.dbnl.com/v0.29.x/workflow/dashboards) for each [Project](https://docs.dbnl.com/v0.29.x/workflow/projects) and stores it for future analysis within the [Data Model](#data-model).

The Data Pipeline is invoked as production log data is ingested into your DBNL [Deployment](https://docs.dbnl.com/v0.29.x/platform/deployment). This process kicks off at data ingestion if using [SDK Log Ingestion](https://docs.dbnl.com/v0.29.x/configuration/data-connections/sdk-log-ingestion) and [SQL Integration Ingestion](https://docs.dbnl.com/v0.29.x/configuration/data-connections/sql-integration-ingestion) and daily at UTC midnight for [OTEL Trace Ingestion](https://docs.dbnl.com/v0.29.x/configuration/data-connections/otel-trace-ingestion).

<figure><img src="https://content.gitbook.com/content/lUoirJaFEHofsQHmOtdL/blobs/DgmKOET6TwFQMM79tN6A/DBNL-data-pipeline.png" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
You can inspect the status of Data Pipeline Runs and restart them from the [Status](https://docs.dbnl.com/v0.29.x/workflow/status) page of a [Project](https://docs.dbnl.com/v0.29.x/workflow/projects).
{% endhint %}

A DBNL Data Pipeline Run performs the following tasks:

1. **Ingest**: Raw production log data is flattened into [Columns](#columns). By using the [DBNL Semantic Convention](https://docs.dbnl.com/v0.29.x/configuration/dbnl-semantic-convention) certain [Columns](#columns) can have rich semantic meaning and allow for deeper [Insights](https://docs.dbnl.com/v0.29.x/workflow/insights) to be generated.
2. **Enrich**: [Metrics](https://docs.dbnl.com/v0.29.x/workflow/metrics) are computed on the ingested log data, creating [Columns](#columns) in each log line corresponding to each computed Metric.
3. **Analyze**: Various unsupervised learning techniques are applied to the enriched log data to discover behavioral signals corresponding to shifts, segments, or outliers in behavior as [Insights](https://docs.dbnl.com/v0.29.x/workflow/insights).
4. **Publish**: [Insights](https://docs.dbnl.com/v0.29.x/workflow/insights) and updated charts are published to [Dashboards](https://docs.dbnl.com/v0.29.x/workflow/dashboards) for consumption by the user.

<figure><img src="https://content.gitbook.com/content/lUoirJaFEHofsQHmOtdL/blobs/SVmy1Op0j6TlnngEFnQB/image.png" alt=""><figcaption><p>Regardless of Ingestion method, the DBNL Data Pipeline ensures that all data is mapped to identical results tables and is treated the same for the purposes of the <a href="../workflow/adaptive-analytics-workflow">Analytics Workflow</a>.</p></figcaption></figure>

## Data Model

### Columns

A single log represents the captured behavior from a production AI product. Data from each log is flattened into multiple Columns, using the [DBNL Semantic Convention](https://docs.dbnl.com/v0.29.x/configuration/dbnl-semantic-convention) whenever possible. The required Columns for a given log are:

* `input`: The text input to the LLM.
* `output`: The text response from the LLM.
* `timestamp`: The UTC timecode associated with the LLM call.

Other semantically known columns can be found in the [DBNL Semantic Convention](https://docs.dbnl.com/v0.29.x/configuration/dbnl-semantic-convention). By using these known column names DBNL can provide better [Insights](https://docs.dbnl.com/v0.29.x/workflow/insights).

### Metrics

[Metrics](https://docs.dbnl.com/v0.29.x/workflow/metrics) are computed from Columns and appended as new Columns for each log.

### Segments

[Segments](https://docs.dbnl.com/v0.29.x/workflow/segments) represent filters on Columns of Logs.
