Uploading data to Distributional

Distributional runs on data, but our goal is to enable you to operate on data you already have available. If you are using a golden dataset to guide your development, we want you to use that to power your Development and Deployment testing. If you have actual Question-Answer pairs from production that are sitting in your data warehouse, we recommend that you to execute Production testing on that data to continually assert that your app is not misbehaving.

Generally, data is organized on Distributional in the form of a parquet file full of app usages, e.g., the prompts and summaries observed in the last 24 hours. This data is then packaged up and shipped to Distributional’s API, primarily through our SDK. This could include any contextual information that can help determine if the app is behaving as desired, such as the day of the week.

Prior to shipping the data to dbnl, the dbnl Eval library can be used to augment your data (especially text data) with additional columns for a more complete testing experience.

Was this helpful?