In this blog, I will show you how to implement a machine learning based trading strategy using the regime predictions made in the previous blog. Do read it, there is a special discount for you at the end of this.
There is one thing that you should keep in mind before you read this blog though: The algorithm is just for demonstration and should not be used for real trading without proper optimization.
Let me begin by explaining the agenda of the blog:
- Create an unsupervised ML ( machine learning) algorithm to predict the regimes.
- Plot these regimes to visualize them.
- Train a Support Vector Classifier algorithm with the regime as one of the features.
- Use this Support Vector Classifier algorithm to predict the current day’s trend at the Opening of the market.
- Visualize the performance of this strategy on the test data.
- Downloadable code for your benefit
Import the Libraries and the Data:
First, I imported the necessary libraries. Please note that I have imported fix_yahoo_finance package, so I am able to pull data from yahoo. If you do not have this package, I suggest you install it first or change your data source to google.
Next, I pulled the data of the same quote, ‘SPY’, which we used in the previous blog and saved it as a dataframe df. I chose the time period for this data to be from the year 2000.
After this, I created indicators that can be used as features for training the algorithm.
But, before doing that I decided on the look back time period for these indicators. I chose a look back period of 10 days. You may try any other number that suits you. I chose 10 to check for the past 2 weeks of trading data and to avoid noise inherent in smaller look back periods.
Apart from the look back period let us also decide the test train split of the data. I prefer to give 80% data for training and remaining 20% data for testing. You can change this as per your need.
Next, I shifted the High, Low and Close columns by 1, to access only the past data. After this, I created various technical indicators such as, RSI, SMA, ADX, Correlation, Parabolic SAR, and the Return of the past 1- day on an Open to Open basis.
Next, I printed the data frame.
And it looked like this:
As you can see, there are many NaN values. We need to either impute them or drop them. If you are new to the machine learning and want to learn about the imputer function, read this. I dropped the NaN values in this algorithm.
In the next part of the code, I instantiated a StandardScaler function and created an unsupervised learning algorithm to make the regime prediction. I have discussed this in my previous blog, so I will not be going into these details again.
Towards the end of the last blog, I printed the Mean and Covariance values for all the regimes and plotted the regimes. The new output with indicators as feature set would look like this:
Next, I scaled the Regimes data frame, excluding the Date and Regimes columns, created in the earlier piece of code and saved it back in the same columns. By doing so, I will not be losing any features but the data will be scaled and ready for training the support vector classifier algorithm. Next, I created a signal column which would act as the prediction values. The algorithm would train on the features’ set to predict this signal.
Next, I instantiated a support vector classifier. For this, I used the same SVC model used in the example by sklearn. I have not optimized this support vector classifier for best hyper parameters. In the machine learning course on Quantra™, we have extensively discussed how to use hyper parameters and optimize the algorithm to predict the daily Highs and Lows, in turn the volatility of the day.
Coming back to the blog, the code for support vector classifier is as below:
Next, I split the test data of the unsupervised regime algorithm into train and test data. We use this new train data to train our support vector classifier algorithm. To create the train data I dropped the columns that are not a part of the feature set:
Then I fit the X and y data sets to the algorithm to train it on.
Next, I calculated the test set size and indexed the predictions accordingly to the data frame df.
The reason for doing this is that the original return values of ‘SPY’ are stored in df, while those in Regimes is scaled hence, won’t be useful for taking a cumulative sum to check for the performance.
Next, I saved the predictions made by the SVC in a column named Pred_Signal.
Then, based on these signals I calculated the returns of the strategy by multiplying signal at the beginning of the day with the return at the opening ( because our returns are from open to open) of the next day.
Finally, I calculated the cumulative strategy returns and the cumulative market returns and saved them in df. Then, I calculated the sharpe ratio to measure the performance. To get a clear understanding of this metric I plotted the performance to measure it.
The final result looks like this.
After so much of code and effort, if the end result looks like this, then someone with no machine learning back ground would say that it is not worth it. I would agree for now. But, look at this line of code:
I just changed the data from SPY to IBM. Then the result looks like this:
I know what you are thinking: I am just fitting the data to get the results. Which is not entirely wrong. I will show you another stock then you decide.
I changed the stock to Freeport-McMoRan Inc and the result looks like this:
You can further change it to GE or something else and check for yourself. This strategy works on some stocks but doesn’t work on others, which is the case with most quant strategies. There are a few reasons why the algorithm did work consistently and I will list some of them here.
- No autocorrelation of returns
- No Support Vector hyper parameter optimization
- No error propagation
- No feature selection
We have not checked for autocorrelation of the returns, which would have increased the predictability of the algorithm. Try that on your own by shifting the returns column by 1 and passing it as feature set. The result would look like this:
Although the improvement from 3.4 to 3.49 is not much, it is still a good feature to have.
Please note that the code will best run with Python 2.7
Learn the application of Machine Learning in Forex markets. Click here to know how to use start with historical data (stock price/forex data) and add indicators to build a model in R/Python/Java. Then select the right Machine learning algorithm to make the predictions.
We have noticed that some users are facing challenges while downloading the market data from Yahoo and Google Finance platforms. In case you are looking for an alternative source for market data, you can use Quandl for the same.
Download Data Files