Turbo-Charging The ARIMAX Option System [Code Included]
Better features, better strategies, and better performance. There's nothing better than that new model smell.
A few weeks ago, we set out to deploy an ARIMAX time series model with the goal of predicting the overnight return direction of the S&P 500. Based on this prediction, we would buy a 1 day-to-expiration call or put near the close and sell it at the next market’s open. We built the model with features that we believed would have the most explanatory power for how the S&P 500 performs.
We chose ARIMAX since it would be able to capture important trends, autocorrelations, and other factors that make for a robust prediction model:
Since there’s now been a good bit of time since the system launched, we’ve been able to make major improvements to both the model, strategy, and overall infrastructure, including a backtest engine with real option prices, a more robust strategy, and new features.
There’s a lot of ground to cover, so without further ado, let’s get right into it.
Finally, a Good Ol’ Backtest
While the strategy itself is simple, the difficult part of building a backtest was the difficulty and cost of historical/real-time option prices to get a record of exactly how our PnL would look. Fortunately, right as I planned to bite the bullet and pay for the data, Databento officially launched access to their options API which pulls directly from OPRA (it launched on August 30th, so literally right at the last minute).
Without sounding like a shill (I’m completely unaffiliated), this is arguably the best data source that’s out there right now. It has the simplicity of yfinance, with the coverage of essentially every option of any asset at nanosecond granularity:
So, with access to affordable high-quality data, we were able to start building the backtest.
Starting from April 1st, 2023 (data only available from 3/28/23) to August 30th, 2023, with a strategy of buying 1 call or put based on the model prediction, the performance was as follows: