Recently, we set out to creating a working quantitative trading strategy powered by machine learning models. By using data from the 11 S&P 500 sectors, we were able to predict the **direction **of the next day’s S&P 500 return, with a precision of 54-56%:

This provided a strong foundation to develop the strategy but unbeknownst to me, I was using **a teaspoon to empty the ocean**. However—as usual, we found a solution that provided even better results and encouraged us to dive head first into production:

Before diving into the meat-and-bones of the strategy, it is important to understand the new models that power it.

#### Goodbye, Regression

While we saw the power of regression models when applied to sports betting data, the limitations of it became glaringly apparent when applied to trading data.

Regression worked well for our baseball model because certain sport outcomes can be linearly deduced. For example, if a pitcher historically throws a large number of strikeouts (X), then a batter with a historically low amount of hits (Y), will generally be more likely to **not **record a hit. A higher X paired with a lower Y will almost always results in a predictably lower output.

**However, this is not the case for financial asset returns.** The two major factors in return prediction which regression does not capture are **trends **and **autocorrelation**. To better understand this, let’s look at a real-world example:

Traditional regression models work by essentially adding optimal weights to each of the features, so in our case, it may apply a 15% weight to the technology sector’s returns and a 2% weight to the returns of the utility sector. In the end, the model is a fixed formula that takes in the features and produces the output as a prediction.

As demonstrated above, this trend-naïve approach fails to capture short-term trends in the data. This may be fine and still average out pretty well in the long-run, but our goal is to make money **now, **so we need to find a better model class that captures trends in both the long-term **and **the short-term.

#### Putting the X in ARIMA(X)

Luckily for us, such a model is perfectly created for exactly this need.

ARIMA models are **significantly **more complex than what we’re used to, so let’s take a birds-eye view of how they work.

**AR**: The AR in ARIMA represents the component that forecasts future values by using past values. This is known as the**autoregressive**component.**I**: The I represents the “noise-free” component. In this stage, the model removes seasonality and trends from the data to get baseline figures like the average return or the average volatility. Doing this creates a baseline figure that represents “normal” conditions. This is known as the**integrated**component; the process is what is known as making the data**stationary**.**MA**: This component considers the past prediction errors to improve the accuracy of future predictions. For example, if the conditions of the last 3 days resulted in higher prediction errors, it will factor that into it’s predictions for the next day. This is known as the**moving average**component.**X**: Finally, this component refers to the features we use to make our predictions. In this case, the X would be the prior days’ returns of each of the 11 sectors. This is known as the**exogenous variable**component.

By combining all of these components, ARIMA(X) models can capture a wide range of temporal patterns including trends, seasonality, and cyclic behaviors. This model class is the go-to for forecasting revenue, interest rates, and things like electricity demand.

Heck, I even trained one to forecast my subscriber growth:

But onto more serious matters, let’s see how much better it was for prediction accuracy.

#### The Showdown

Before analyzing the results, we first made a few changes to the data structure and overall methodology.

First, instead of using as much data as possible, we restrict our dataset to a rolling 180 day window. We do this because adding too much data makes short-term trends almost * impossible *to detect. This is because we are bound to have multiple periods with the same clusters of “trends” that are spread out across uncorrelated years (e.g., a 5-day positive return streak in 1999 is

**completely**different than a 5-day positive streak in 2013).

Here’s what it looks like when we use too much data: