Futures Sniffing: The Problem With Quantitative Alphas [Code Included]
Having a working quant strategy is nice, but why should we stop there?
Here at The Quant’s Playbook, we believe in constant and continuous growth. So, when we developed and deployed a fully-functional quantitative strategy, we didn’t stop there.
In our pursuit of even greater returns and diversification, we explored an entirely new approach: Futures Sniffing, for some rather interesting results.
But first, here’s a quick update on how the strategy has performed since the last update:
Futures Sniffing: Background
To build out this idea, we borrowed some concepts from our friends in the ethical hacking department. Packet sniffing refers to the practice of gathering, collecting, and logging all of the information packets that pass through a computer network, regardless of how the packet is addressed.
The goal of packet sniffing is to turn this jumbled stream of web requests and responses into a structured pattern that we can derive relationships, patterns, and most importantly, information from.
So naturally, we’ll see how we can apply that to real-life markets. By using information from all available futures markets, can we use our models to create useful and tradeable information?
Model Recap
Before diving into the experiment, let’s first do a brief review of our chosen model class.
We use the time-series ARIMAX model to use information from the 11 S&P 500 sectors to better predict the immediate future return of the S&P 500. This model class is effective because it incorporates auto-regressive features that factor in correlations, mean-reversions, trends, and all we need for strong estimates. That, paired with the exogenous (X) variable component, means that we can flexibly add layers of robustness:
The rationale behind the model’s success is simple, the S&P 500 is directly influenced by the corresponding sectors, so it makes sense that the relationship can be captured.
But what if we tried to reverse the order? Instead of going from economic rationale to data and modeling; can we apply data and modeling first to then infer an economic rationale?
To figure this out, let’s dive into some futures data.
Experiment Construction
In the last experiment, we fed the model 30 days of data from the sectors, so we will stick with that dataset size. But we are going in reverse order, so we first have to sniff for which assets are the most suitable to model the relationships for.
To do this, we will construct a correlation matrix of each futures contract to the next day’s overnight return of the other futures contracts. For example, if on August 1st, Soybean futures increase by 1%, then on August 2nd, Corn futures increase by 0.5%, with wheat futures increasing by 0.75%, we record that as a data point. We then iterate through each available market over multiple days:
Each correlation answers the question of “how correlated is the return of X to the return of Y tomorrow?”. Right off the bat, we already see some strange, possibly spurious, correlations.
Over the last 30 days, it appears that crude oil is most positively correlated to the next day’s copper return (i.e., when crude oil opened upward, so did copper the next day). Next, gasoline is the most negatively correlated to the next day’s gold returns (i.e., when gasoline opened up, gold opened down the next day).
This clearly doesn’t make much sense, what does crude oil and copper have in common? Further, what does gasoline and gold have in common?
Well, pretty much nothing. This is essentially just a pure quantitative relationship.
But even without a clear economic rationale, can this type of relationship sniffing be used as the basis for a profitable strategy?
Well, let’s find out.
The Strategy
This data is structured far simpler than option data, so we have an incentive to first build out a backtest, as opposed to diving into forward testing.
First, we have to create a rule-set.
Each day, we create the shifted correlation matrix using data from the last 30 days.
From this matrix we isolate the most positively correlated pair, then the most negatively correlated pair.
We then train an ARIMAX model, with data from both futures. The exogenous variable will be the asset the future is correlated to. For example, if our pair is crude oil and copper, we use the crude oil returns as the exogenous variable, and the future copper returns as the Y to eventually predict.
We make a trade based on both of the predictions:
If the prediction for copper is to open positive, we buy the contract before close
If the prediction for gasoline is to open negative, we short the contract before close
Rinse and repeat.
So, let’s see what those returns look like in action (it may surprise you):