SQL Integration Ingestion

Pull data from existing SQL tables

Formatting

Currently, all columns from the supplied table will be ingested and flattened into Columns as part of the Data Pipeline. Any columns corresponding to the DBNL Semantic Convention will be mapped accordingly.

Custom SQL queries are coming soon and currently only available to select co-build partners. Contact us at [email protected] if you want to become an early access co-build partner.

The following fields are required regardless of which ingestion method you are using:

  • input: The text input to the LLM as a string.

  • output: The text response from the LLM as a string.

  • timestamp: The UTC timecode associated with the LLM call as a timestamptz.

See the DBNL Semantic Convention for other semantically recognized fields. Using these naming conventions will allow DBNL to map semantic meaning to those columns and provide better Insights.

Creating a new SQL Data Connection

From the Namespace landing page click on "Data Connections" on the left panel. On the Data Connections landing page "+ Add Data Connection" in the upper right. Provide a required name for the Data Connection and an optional description. All Data Connections will be available to any User creating a Project in the Namespace.

Debugging

All Data Pipeline Runs for a Project can be inspected and restarted in the Project Status page.

Supported Integrations

Google BigQuery

Required configuration at Namespace level

  • Google Application Credentials JSON: The JSON string content of the service account credentials for BigQuery authentication

  • Google Cloud Project ID: The Google Cloud project ID where BigQuery is enabled. This will be used in the SQLAlchemy connection URL

Required at Project level

  • Table Name: Table name to be ingested from

  • Ingestion Delay: How long to wait after UTC midnight to begin ingesting data. It is recommended to wait at least 10 minutes for all data to be loaded into the table.

Optional configuration at Project level

  • Backfill To: How far back in the table to load data.

Databricks

Required configuration at Namespace level:

  • Databricks Host: The Databricks host URL (e.g., 'https://adb-1234567890123456.7.azuredatabricks.net')

  • HTTP Path: The HTTP path to the Databricks SQL endpoint (e.g., 'sql/protocolv1/o/1234567890123456/1234-567890-abcdefg')

  • Access Token: Databricks application token for authentication

  • Catalog: The catalog to use in the SQLAlchemy connection URL

  • Schema: The schema to use in the SQLAlchemy connection URL

Required at Project level

  • Table Name: Table name to be ingested from

  • Ingestion Delay: How long to wait after UTC midnight to begin ingesting data. It is recommended to wait at least 10 minutes for all data to be loaded into the table.

Optional configuration at Project level

  • Backfill To: How far back in the table to load data.

Snowflake

AWS Redshift

Was this helpful?