Trading Strategy
The following tutorial will help you understand the basics of how to continuously test a multi-component system involving components owned by third-parties.
The data files required for this tutorial are available in the following file.
Trading Strategy Introduction
In this tutorial, we use a basic asset trading strategy to illustrate a small multi-component system. This trading strategy makes trade decisions that are based on tweets.
How the Trading Strategy Works
The tweets are analyzed for sentiment and entities by two separate components. Then the trades are recommended by another component for the entities that are mentioned in the tweets and sentiment of each tweet. If the positive sentiment of a tweet is no less than .7, a set of buy trades are recommended. Otherwise, a set of sell trades are recommended.
How to Test the Trading Strategy
The test strategy involves matching current behavior to expected baseline Trading System behavior. The process for testing the Trading System involves passing a fixed and representative dataset of tweets the the whole trading system and then to compare its performance at a granular, distributional level, with the performance of a recorded baseline of collecting such data.
In this tutorial, we will use Distributional's testing platform to run the tests. In order to do so, we will define a run config, the tests to run, and create code which compiles the results to send to dbnl.
Trading Strategy Composition
The trading strategy is composed of 4 components: TweetSource, SentimentClassifier, EntityExtractor, and TradeRecommender with some dependency structure. That is, some of these components depend on the outputs of others.
Directed Acyclic Graph
The following is a directed acyclic graph (DAG) that represents the dependencies between the components of the Trading Strategy. Note that the Global
component is a placeholder to represent the information that is shared (or derived) by more than one component.
Unexpected error with integration mermaid: Integration is not installed on this space
Creating data and sending it to dbnl
Designing and executing continuous testing
Once you have data in Distributional, you can conduct continuous integration testing by creating tests which define consistent behavior between the current status of the app and the baseline as defined earlier.
Exploration of Test Results
This tutorial provides 10 total runs reflecting the continuous testing of the asset trading strategy described above. You can use the code provided to submit these in chronological order, or your dedicated DBNL customer engineer can have these automatically loaded into your account. After that has taken place, if the baseline has been appropriately set, your test sessions will be automatically populated.
def submit_next_run():
filename = next(filenames)
run_name = get_run_name(filename)
print(f"Submitting run {run_name}")
test_data = pd.read_parquet(filename)
run = dbnl.create_run(
project=proj,
display_name=run_name,
run_config=run_config,
metadata=test_data.attrs,
)
dbnl.report_results(run=run, data=test_data.reset_index())
dbnl.close_run(run=run)
return run
for i in range(1, 10):
run = submit_next_run()
if i == 5:
dbnl.experimental.set_run_as_baseline(run=run)

When a test session states Passed, no individual test assertions failed. When a test session states Failed, at least one test assertion was violated -- these warrant further investigation as described below.
Run 04 - Loss of consistency in sentiment classifier
When we click into the earlier failed test session, the test session associated with Run 04, we can conduct some analysis and determine the root cause of failures.

As we can see from the tags present on the failures, the TradeRecommender
and SentimentClassifier
components both seem to encounter failures. Referring back to the run detail page where the Components DAG structure is present, we can see that the SentimentClassifier
is upstream from the TradeRecommender
. This leads us to believe that the SentimentClassifier
is the more likely source of issues and worth investigating first.
Within this test session, we can expand out one of the failed assertions. In the image below, we can see some severely different behavior in sentiment -- well beyond the threshold for this KS test. Given that the TweetSource tests passed, we know that the data has not changed. To learn what behavior is changing in the SentimentClassifier
we can visit the Compare page which is linked.

From the Compare page, we can try to learn how the columns of the run may yield insights as to why we are seeing such aberrant app behavior. In particular, we can use the raw text present in the tweet_content
to conduct analysis on properties that DBNL can derive (including the length of the tweet). The graph below shows some, perhaps, bizarre change in behavior, where long tweets are seen very positively and short tweets are seen very negatively.

Using this information, our trading team was able to alert the sentiment classifier team to the large change in behavior. This helped lead them to fix a bug in their system which was corrected; after that correction, Run 05 was created to show that the behavior has returned to normal. After that, Run 05 is considered the new baseline.
Run 07 - Loss of consistency in entity extractor
Again, Distributional's continuous testing is able to catch failed behavior which emerges in Run 07. Below, we can see the test session (similarly to above) but the failed test assertions lead to a different conclusion than before.

Here, the TradeRecommender
tests are again failing, but this time the EntityExtractor
test is failing and the SentimentClassifier
tests are all passing (as are the TweetSource
tests). This breadth of testing across all components allows us to believe that the EntityExtractor
is the most likely source of problems because it is upstream of TradeRecommender
.
Expanding the test assertion associated with EntityExtractor
shows us that the number of quantities that are being extracted have suddenly changed greatly. If this is not expected, then this is a massive shift in behavior and we need to investigate further.

Was this helpful?