## Strategy using Trend-following Indicators: MACD, ST and ADX

This article is the final project submitted by the author as a part of his coursework in Executive Programme in Algorithmic Trading (EPAT™) at QuantInsti™. Do check our Projects page and have a look at what our students are building.

Gopal, a management professional, has over 2 decades of experience in IT industry with a strong Global Delivery background, with passion in Quantitative Finance and gadgets.  He is highly process driven with a focus on achieving quality using automation. He heads delivery at PreludeSys India Ltd. Gopal holds an MBA from Symbiosis Centre For Distance Learning. He has successfully completed the course work for the Executive Programme in Algorithmic Trading (EPAT™) in November 2016.

## Implementing Pairs Trading Using Kalman Filter

This article is the final project submitted by the author as a part of his coursework in Executive Programme in Algorithmic Trading (EPAT™) at QuantInsti. Do check our Projects page and have a look at what our students are building.

### Introduction

Some stocks move in tandem because the same market events affect their prices. However, idiosyncratic noise might make them temporarily deviate from the usual pattern and a trader could take advantage of this apparent deviation with the expectation that the stocks will eventually return to their long term relationship. Two stocks with such a relationship form a “pair”. We have talked about the statistics behind pairs trading in a previous article.

This article describes a trading strategy based on such stock pairs. The rest of the article is organized as follows. We will be talking about the basics of trading an individual pair, the overall strategy that chooses which pairs to trade and present some preliminary results. In the end, we will describe possible strategies for improving the results.

Let us consider two stocks, x and y, such that

y = \alpha + \beta x + e

\alpha and \beta are constants and e is white noise. The parameters {\alpha, \beta} could be obtained from a linear regression of prices of the two stocks with the resulting spread

e_{t} = y_{t} – (\alpha + \beta x_{t})

Let the standard deviation of this spread be \sigma_{t}. The z-score of this spread is

z_{t} = e_{t}/\sigma_{t}

The trading strategy is that when the z-score is above a threshold, say 2, the spread can be shorted, i.e. sell 1 unit of y and buy \beta units of x. we expect that the relationship between x and y will hold in the future and eventually the z-score will come down to zero and even go negative and then the position could be closed. By selling the spread when it is high and closing out the position when it is low, the strategy hopes to be statistically profitable. Conversely, if the z-score is below a lower threshold say -2, the strategy will go long the spread, i.e. buy 1 unit of y and sell \beta units of x and when the z score rises to zero or above the position can be closed realizing a profit.

There are a couple of issues which make this simple strategy difficult to implement in practice:

1. The constants \alpha and \beta are not constants in practice and vary over time. They are not market observables and hence have to be estimated with some estimates being more profitable than others.
2. The long term relationship can break down, the spread can move from one equilibrium to another such that the changing {\alpha,\beta} gives an “open short” signal and the spread keeps rising to a new equilibrium such that when the “close long” signal come the spread is above the entry value resulting in a loss.

Both of these facts are unavoidable and the strategy has to account for them.

### Determining Parameters

The parameters {\alpha, \beta} can be estimated from the intercept and slope of a linear regression of the prices of y against the prices of x. Note that linear regression is not reversible, i.e. the parameters are not the inverse of regressing x against y. So the pairs (x,y) is not the same as (y,x). While most authors use ordinary least squares regression, some use total least squares since they assume that the prices have some intraday noise as well. However, the main issue with this approach is that we have to pick an arbitrary lookback window.

In this paper, we have used Kalman filter which is related to an exponential moving average. This is an adaptive filter which updates itself iteratively and produces \alpha, \beta, e and \sigma simultaneously. We use the python package pykalman which has the EM method that calibrates the covariance matrices over the training period.

Another question that comes up is whether to regress prices or returns. The latter strategy requires holding equal dollar amount in both long and short positions, i.e. the portfolio would have to be rebalanced every day increasing transaction cost, slippage, and bid/ask spread. Hence we have chosen to use prices which is justified in the next subsection.

### Stability of the Long Term Relationship

The stability of the long term relationship is determined by determining if the pairs are co-integrated. Note that even if the pairs are not co-integrated outright, they might be for the proper choice of the leverage ratio. Once the parameters have been estimated as above, the spread time series e_{t} is tested for stationarity by the augmented Dickey Fuller (ADF) test. In python, we obtain this from the adfuller function in the statsmodels module. The result gives the t-statistics for different confidence levels. We found that not many pairs were being chosen at the 1% confidence level, so we chose 10% as our threshold.

One drawback is that to perform the ADF test we have to choose a lookback period which reintroduces the parameter we avoided using the Kalman filter.

### Choosing Sectors and Stocks

The trading strategy deploys an initial amount of capital. To diversify the investment five sectors will be chosen: financials, biotechnology, automotive etc. A training period will be chosen and the capital allocated to each sector is decided based on a minimum variance portfolio approach. Apart from the initial investment, each sector is traded independently and hence the discussion below is limited to a single sector, namely financials.

Within the financial sector, we choose about n = 47 names based on large market capitalization. We are looking for stocks with high liquidity, small bid/ask spread, ability to short the stocks etc.  Once the stock universe is defined we can form n (n-1) pairs, since as mentioned above (x,y) is not the same as (y,x). In our financial portfolio, we would like to maintain up to five pairs at any given time. On any day that we want to enter into a position (for example the starting date) we run a screen on all the n (n-1) pairs and select the top pair(s) according to some criteria some of which are discussed next.

### Choosing Pairs

For each pair, the signal is obtained from the Kalman filter and we check if |e| > nz \sigma, where nz is the z-score threshold to be optimized. This ensures that this pair has an entry point. We perform this test first since this is inexpensive. If the pair has an entry point, then we choose a lookback period and perform the ADF test.

The main goal of this procedure is not only to determine the list of pairs which meets the standards but rank them according to some metrics which relates to the expected profitability of the pairs.

Once the ranking is done we enter into the positions corresponding to the top pairs until we have a total of five pairs in our portfolio.

### Results

In the following, we calibrated the Kalman filter over Cal11 and then used the calibrated parameters to trade in Cal12. In the following, we kept only one stock-pair in the portfolio.

In the tests shown we kept the maximum allowed drawdown per trade to 9%, but allowed a maximum loss of 6% in one strategy and only 1% in the other. As we see from above the performance improves with the tightening of the maximum allowed loss per trade. The Sharpe ratio (assuming zero index) was 0.64 and 0.81 respectively while the total P&L was 9.14% and 14%.

The thresholds were chosen based on the simulation in the training period.

### Future Work

1. Develop better screening criterion to identify the pairs with the best potentials. I already have several ideas and this will be ongoing research.
2. Optimize the lookback window and the buy/sell Z-score thresholds.
3. Gather more detailed statistics in the training period. At present, I am gathering statistics of only the top 5 (based on my selection criteria). However, in future, I should record statistics of all pairs that pass. This will indicate which trades are most profitable.
4. In the training period, I am measuring profitability by the total P&L of the trade, from entry till the exit signal is reached. However, I should also record max profit so that I could determine an earlier exit threshold.
5. Run the simulation for several years, i.e. calibrate one year and then test the next year. This will generate several year’s worths of out-of-sample tests. Another window to optimize is the length of the training period and how frequently the Kalman filter has to be recalibrated.
6. Expand the methodology to other sectors beyond financials.
7. Explore other filters instead of just Kalman filter.

### Next Steps

If you are a coder or a tech professional looking to start your own automated trading desk. Learn automated trading from live Interactive lectures by daily-practitioners. Executive Programme in Algorithmic Trading (EPAT™) covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. Enroll now!

Note:

The work presented in the article has been developed by the author, Mr. Dyutiman Das. The underlying codes which form the basis for this article are not being shared with the readers. For readers who are interested in further readings on implementing pairs trading using Kalman Filter, please find the article below.

Link: Statistical Arbitrage Using the Kalman Filter by Jonathan Kinlay

## Pairs Trading on ETF – EPAT Project Work

Edmund Ho did his Bachelors in commerce from University of British Columbia, He completed his Masters in Investment Management from Hong Kong University of Science and Technology. Edmund was enrolled in the 27th Batch of EPAT™, and this report is part of his final project work.

### Project Summary

ETFs are very popular for pairs trading simply because it eliminates the firm-specific factors.   On top of that, most of the ETFs are short-able so we don’t have to worry about the short constraint.   In this project, we are trying to build a portfolio using 3 ETF pairs in oil (USO vs XLE), technology (XLK vs IYW), and financial sectors (XLF vs PSCF).

Over the long run, the overall performance of the miners is highly correlated with the commodities.  In short term, they may have divergences due to individual company’s performance or overall equity market performance, and hence the short term arbitrage opportunities may exist.  In technology sector, we attempt to seek mispricing on both large cap technology ETFs. Last, we attempt to see if arbitrage opportunity exists between the large and mid-cap financial ETFs.

### Pair 1 – Oil Sector USO vs XLE

#### Cointegration Test

The above charts were generated in R Studio.   The in sample data generated between Jan 1st, 2011 and Dec 31, 2014.

First, we plot the price for the pairs and it gives us an impression that both price series are quite similar.  Then we perform the regression analysis for USO vs XLE (return USO = Beta * Return XLE + Residual) and find the beta or hedge ratio to be 0.7493.  Next, we apply the hedge ratio to generate the spread returns.  We can see the spread returns deviate closely around 0, which shows the characteristic of cointegrating pattern.  At last, we apply the Augmented Dickey-Fuller test with a confidence level at 0.2 and check if the pairs pass the ADF test.  The results are as follow:

###### Augmented Dickey-Fuller Test

data:(spread) Dickey-Fuller = -3.0375, Lag order = 0, p-value = 0.1391

alternative hypothesis: stationary

[1] “The spread is likely Cointegrated with a pvalue of 0.139136738842547”

With the p-value of 0.1391, the pairs satisfied the cointegration test, and we will go ahead and back-test the pairs in next section.

#### Strategy Back-Testing

The above back-testing result were generated in R Studio.  The back-testing period was using the in-sample data similar to the cointegration test.  Our trading strategy is relatively simple as follow:

• If the spread is greater than +/- 1.5 standard deviations of its rolling lookback period of 120 days’ standard deviation, then we go short / long accordingly.
• At all time, only 1 open position
• Close the long/short position when the spread reverts to its mean/moving average.

The above back-testing result were generated in R-Studio using the PerformanceAnalystics package.  During the in sample back-testing period, the strategy achieved a cumulated return of 121.03% where the SPY (S&P500) had a cumulative return of 61.78%.  This translated into an annualized return of 22% and 12.82%, respectively.  In terms of risk analysis, the strategy had much lower annualized standard deviation of 11.63% vs. 15.35% in SPY.  The worst drawdown percentage for the strategy was 6.39% vs. 19.42% in SPY.  The annualized Sharpe ratio was superior in our strategy at 1.89 vs. 0.835 in SPY.  Please note that all of the above calculations did NOT factor into transaction cost.

#### Out of sample test

For the out of sample period between Jan 1st 2015 and Dec 31, 2015, the pairs did not pass the ADF test suggested by a high p-Value at 0.3845.  The phenomenon could be explained by the sharp decline in cruel oil price where the equity market was persisting in an uptrend.  If we look at the spread returns, they first seem to be cointegrating around 0 but with a much larger deviation suggested by the chart below.

The actual spread obviously did not suggest a cointegrating pattern as indicated by the high p-Value.  Next, we will go ahead and back-test the same strategy using the out of sample data despite the pair fails the cointegration test.   The hedge ratio was found to be 1.1841.  The key back-testing results were generated in R-Studio as follow:

USO and XLE Stat Arb          SPYAnnualized Return           0.09862893 -0.007623957

USO and XLE Stat Arb          SPYCumulative Return           0.09821892 -0.007593818

USO and XLE Stat Arb         SPYAnnualized Sharpe Ratio (Rf=0%)            0.5632756 -0.04884633

USO and XLE Stat Arb       SPYAnnualized Standard Deviation            0.1750989 0.1560804

USO and XLE Stat Arb       SPYWorst Drawdown            0.1706643 0.1228571

At a first glance, the strategy seems outperform the SPY in all aspects, but due to the lookback period which was set same as in-sample back-testing data (120 days for consistency), this strategy only had 1 trade during the out of sample period, which may not reflect the situation going forward.  However, this shows that it is not necessary to have a perfect cointegrating pairs in order to extract profit opportunities.  In reality, only a few perfect pairs would pass the test.

### Pair 2 – Large Cap Technology XLK vs. IYW

#### Cointegration Test

It should not be a surprise that XLK and IYW has a strong linear relationship as demonstrated in the regression analysis with a hedge ratio at 0.903.  The two large cap technology ETFs are very similar in nature except for its size, volume, expense ratio, etc.  However, if we take a closer look at the actual return spreads, it doesn’t seem to satisfy the cointegration test.  If we run the ADF test, the result shows they are not likely to be cointegrated with a p-Value at 0.5043.

Augmented Dickey-Fuller Test

data: (spread) Dickey-Fuller = -2.1748, Lag order = 0, p-value = 0.5043

alternative hypothesis: stationary

[1] “The spread is likely NOT Cointegrated with a pvalue of 0.504319921216107”

The purpose to run the strategy in this pair is to see if there is any mispricing (short term deviation) in the pair in order to profit from it.  In the USO and XLE example, we observed that profit opportunity may still exist despite the pair failing the cointegration test.  Here, we will go ahead to test the pair and see if any profit opportunity exists.

#### Back-testing Result

 XLK and IYW Stat Arb       SPYAnnualized Return         -0.003581305 0.1282006 XLK and IYW Stat Arb       SPYCumulative Return          -0.01420635 0.6177882 XLK and IYW Stat Arb       SPYAnnualized Sharpe Ratio (Rf=0%)           -0.2839318 0.8347157 XLK and IYW Stat Arb      SPYAnnualized Standard Deviation           0.01261326 0.153586 XLK and IYW Stat Arb       SPYWorst Drawdown           0.02235892 0.1942388

The back-testing results illustrated that the strategy performed very poorly during the back-testing period between Jan 1st 2011 and Dec 31, 2014.  This demonstrates that the two ETFs are highly correlated to each other and it’s very hard to extract profit opportunity from them.  In order for a statistical arbitrage strategy to work, we need a pair with some volatility in their spreads but they should eventually show a mean-reverting pattern.  In next section, we will perform the same analysis on the Financial ETFs.

### Pair 3 – Financial Sectors XLF vs. PSCF

Cointegration Test

In this pair, we attempt to seek a trading opportunity between the large financial cap ETF XLF and the small financial cap ETF PSCF.  From the price series they both show very similar pattern.  In terms of regression analysis, they obviously show a strong correlation with a hedge ratio of 0.9682.  The spread return also illustrates some cointegrating pattern with the spread deviating around 0.  The ADF test with test value set at 80% confidence shows the pair is likely to be cointegrated with a p-value at 0.1026.

Augmented Dickey-Fuller Test

Dickey-Fuller = -3.1238, Lag order = 0, p-value = 0.1026

alternative hypothesis: stationary

[1] “The spread is likely Cointegrated with a pvalue of 0.102608136882834”

#### Back-testing Result

XLF and PSCF Stat Arb       SPYAnnualized Return            0.01212355 0.1282006

XLF and PSCF Stat Arb       SPYCumulative Return            0.04923268 0.6177882

XLF and PSCF Stat Arb       SPYAnnualized Sharpe Ratio (Rf=0%)             0.1942203 0.8347157

XLF and PSCF Stat Arb      SPYAnnualized Standard Deviation            0.06242163 0.153586

XLF and PSCF Stat Arb       SPYWorst Drawdown            0.07651392 0.1942388

Although the pair satisfied the cointegration test with a low p-value, the back-testing results demonstrated a below average performance when we compare to the index return.

### Conclusion

In this project, we chose 3 different pairs of ETFs to back test our simple mean-reverting strategy.  The back test results show superior performance on USO/XLE, but not on the other two pairs.  We can conclude that in order for the pair trading strategy to work, we do not need a pair that shows strong linear relationship, but a long term mean reverting pattern is essential to obtain a decent result.  In the pair XLK/IYW, we attempt to seek for mispricing between the two issuers, however, in an efficient ETF market in US, mispricing on such big ETFs is very rare, hence this strategy performs very poorly on this pair.  On the other hand, the correlation and cointegration test on the pair XLF/PSCF illustrate the pair is an ideal candidate to trade the statistical arbitrage strategy, however, the back-testing results show the other way around.  Any statistical arbitrage strategy we are essentially playing with the volatility, but if there is not enough volatility around the spreads to begin with, like the pair in XLK/IYW, the profit opportunity is trivial.   In the pair USO/XLE, the volatility around the spreads is ideal and the cointegration test shows the pair has a mean-reverting pattern, therefore it is not a surprise this pair prevails in the back-testing results.

### Next Step

• Project_Cointegration_Test.R
• Project_Backtest_Test.R

## Implementing Pairs Trading/Statistical Arbitrage Strategy In FX Markets : EPAT Project Work

This article is the final project submitted by the author as a part of his coursework in Executive Programme in Algorithmic Trading (EPAT™) at QuantInsti. Do check our Projects page and have a look at what our students are building.

Harish Maranani did his Bachelors in Technology in Electronics and Communications Engineering from Acharya Nagarjuna University, MBA Finance from Staffordshire University (UK), Certificate in Quantitative Finance (CQF), and Master of Science in Mathematical and Computational Finance from New Jersey Institute of Technology, Newark, USA. Harish was enrolled in the 27th Batch of EPAT™, and this report is part of his final project work.

Aim: To implement pairs trading/statistical arbitrage strategy in currencies.

Pairs Chosen: EURINR, USDINR, GBPINR, AUDINR, CADINR, JPYINR

Frequency: Daily

Time Period: 2011/4/21 to 2013/5/22

Implemented using: Python.

Pair Selection Criteria for FX Markets:

• The time series data for the above-chosen currency pairs is imported from quandl.
• Co-integration Test is carried out on all possible pair combinations viz. EURINR-USDINR, EURINR-GBPINR etc.
• Selecting Co-integrated pairs whose t-static value is less than 5% critical value of -2.8.
• Slicing the pairs which meet the co-integration condition for further analysis.
• To further test for confirmation of co-integration, CADF test is carried out on the sliced pairs from the pool.
• Z-score is calculated for each selected pair combination and the strategy is applied.
• Profit/loss, equity curve, maximum drawdown, are calculated/tabulated/plotted.
• Consider two currency pairs EUR/INR and USD/INR. Here the base currencies are EUR and USD respectively and the counter currency is INR.

Preliminary Test:

• In order to find the pairs of currencies that are co-integrated, a preliminary test through coint(x,y) from statsmodels.tsa.stattools is carried out and their respective pvalues, tstatic are plotted below.
• The t-static values that are displayed below are the ones that passed the co-integration test. i.e the t-static values smaller than the 5% critical value of -2.86.

Below is the list of pairs whose T-static values are less than the 5% critical value of -2.86:

• [‘EURINR/USDINR: -3.89372142826’,
• ‘EURINR/GBPINR: -3.04457063111’,
• ‘USDINR/AUDINR: -3.14784526027’,

Below is the plot of p-values of the co-integrated pairs:

Before rejecting null hypothesis to confirm the prices are mean-reverting, we shall conduct Co-Integrated Augmented Dickey-Fuller (CADF) test to confirm the same for the above sliced pairs out from the whole set of currencies. Below are the Results and plots.

We shall consider the 4 co-integrated pairs based on T-Static Values for CADF testing.

The following are the 4 Co-integrated pairs:

EURINR/USDINR:  -3.89372142826

#### EURINR/USDINR

TIME SERIES PLOTS OF EURINR/USDINR

From the above graph, it is visibly evident that the prices are co-integrated, however, to statistically confirm the same, the below set of tests/procedures are implemented.

Creating a scatter plot of the prices, to see the relationship is broadly linear.

Given the above residual plot, it is relatively stationary.

##### Co-integrated Augmented Dickey-Fuller Test Results

Co-integrated Augmented Dickey-Fuller (CADF) test determines the optimal hedge ratio by performing a linear regression against the two-time series and then tests for stationarity under the linear combination.

Implementing in python gives the following result:

(-3.0420602182962395,

0.03114885626164075,

1L,

652L,

{'1%': -3.440419374623044,

'10%': -2.5691361169972526,

'5%': -2.8659830798370352},

852.99818965061797)

Given the above results, the t-static to be -3.04 less than 5% critical value of -2.8, we can reject the null hypothesis and can confirm that the prices are mean-reverting.

Below are the time series, scatter and residual plots of GBPINR/CADINR

(-3.3637522231183872,

0.012258395060108089,

2L,

651L,

{'1%': -3.440434903803665,

'10%': -2.569139761751388,

'5%': -2.865989920612213},

-179.04749802146216)

Given the above results, the t-static to be -3.36 smaller than 5% critical value of -2.8, we can reject the null hypothesis and can confirm that the prices are mean-reverting.

Given the above results, the t-static to be -2.93 smaller than 5% critical value of -2.8, we can reject the null hypothesis and can confirm that the prices are mean-reverting.

(-2.9344605252608607,

0.041484961304201866,

1L,

652L,

{'1%': -3.440419374623044,

'10%': -2.5691361169972526,

'5%': -2.8659830798370352},

-99.577663481220952)

#### USDINR/AUDINR

Below are the results from CADF test:

(-3.2595055880757768,
0.016788501512565262,
4L,
649L,
{'1%': -3.440466106307706,
'10%': -2.5691470850496558,
'5%': -2.8660036655537744},
381.77145926378489)

With the t-static value of -3.25 smaller than the 5% critical value of -2.86, we can reject the null hypothesis and can confirm that the pair is co-integrating.

Now that we have found the co-integrated pairs in the form of following pairs with t-static values:

• EURINR/USDINR: -3.04
• USDINR/AUDINR: -3.259

Next Step would be to calculate the Z-score of the price ratio for 30day moving average and 30day standard deviation:

• Calculating price ratios and creating a new column ratio in the data frames (df, df1, df2, df4) of the above currency pairs respectively.

Below is the snapshot of the data frames:

df:

Df1:

Calculation of Z-score of the price ratio for the 30-day window of moving average and standard deviation:

• Below are the plots of z-scores for the above co-integrated pairs with their respective price ratios:

From the above Z-Score plots of the selected pairs, Z-score is exhibiting mean reverting behavior within 2 standard deviations.

• When z-score touches +2 short the pair and close the position when it reverts back to +1
• When z-score touches -2 long the pair and close the position when it reverts back to -1.
• Only one position is held at a single instance of time.

Equity Curve:

Plotting the equity curve with the starting capital of 100 INR equally divided among 4 pairs.

With 100 INR initial Capital, equity ended at 114.05.

Cumulative profit to be 14% without any leverage. With 10 times leverage (ideal for FX trading), the profits can be seen at 140%. Below are the important performance metrics of the strategy.

 Profit percentage Without Leverage 14.0514144897 % Profit percentage with 10 times leverage 140.514144897 % Number of Positive Trades 59 Number of Negative Trades 23 Hit Ratio 71.9512195122 % Average Positive Trade 0.46886657456220338 Average Negative Trade -0.59181362649660851 Average Profit/Average Loss 0.792253766338 Maximum Drawdown -5.1832506579 %

The above graph shows the maximum drawdown points marked with red dots and the value is added in the above table.

### Instructions for Implementation

• Please run the IPython notebook named harish_stat_arb.ipynb for the confirmation of results and plots.
• Another option is to run the python script harish_quantinsti_final_project_code.py on any python IDE to confirm the results and graph.
• Use the below code for exporting the final dataframe to an excel file.
writer = pd.ExcelWriter('pairs_final.xlsx',engine = 'xlsxwriter')

pairs.to_excel(writer,'Sheet5')

writer.save()

### Conclusion

Though the strategy has generated 140% returns over the backtest period of 2 years, the following factors should be considered in order to evaluate a more accurate performance of the strategy.

• The model has ignored the slippage and commissions.

### Bibliography

• Statistical Arbitrage lecture Quantinsti, Nitesh Khandelwal.
• Pairs Trading, Ganapathy Vidyamurthy, Wiley Finance.
• Successful Algorithmic trading, Michael Halls-Moore.

## Shorting at High: Algo Trading Strategy in R

Milind began his career in Gridstone Research, building earnings models and writing earnings notes for NYSE listed companies, covering Technology and REITs sectors. Milind has also worked at CRISIL and Deutsche Bank, where he was involved in modeling of Structured Finance deals covering Asset Backed Securities (ABS), and Collateralized Debt Obligations (CDOs) for the US and EMEA region.

Milind holds a MBA in Finance from the University of Mumbai, and a Bachelor’s degree in Physics from St. Xavier’s College, Mumbai.

### Ideation

The Executive Programme in Algorithmic Trading (EPAT) exposed me to all the requisite subjects needed to learn algorithmic trading. As part of the EPAT project work I tried coding many strategies. Since I am a novice to algorithmic trading I wanted to code the simplest, and the most basic strategies. Although simple and basic, one should not underestimate the power of such strategies, as they can generate good returns.

“Shorting at High” was one of the strategies that I formulated for my project work. This post explains the strategy in brief and the coding part. I welcome the readers to give suggestions, improvise or to use the strategy.

### Strategy in brief

The strategy is to short the best stocks which cross the set percentage threshold on the upside (say 8%-9%) during intraday trading. The expectation for the shorted stocks is to fall by an amount as predicted in the metrics sheet which is generated upon executing the code. (more…)

## Pair Trading Strategy and Backtesting using Quantstrat

### A Recent Webinar Presentation by Marco Nicolas Dibo

This insightful webinar on pairs trading and sourcing data covers the basics of pair trading strategy followed by two examples. In the first example, Marco covers the pairs trading strategy for different stocks traded on the same exchange, and in the second example, Marco has illustrated the pairs strategy for different commodity futures traded on different exchanges. Marco also details the different data sources including Quandl which can be used for creating trading strategies.

This article is the final project submitted by the author as a part of his coursework in Executive Programme in Algorithmic Trading (EPAT) at QuantInsti. Do check our Projects page and have a look at what our students are building.

### Author

Marco has spent his career as a trader and portfolio manager, with a particular focus in equity and derivatives markets. He specializes in quantitative finance and algorithmic trading and currently serves as head of the Quantitative Trading Desk and Vice-president of Argentina Valores S.A. Marco is also Co-Founder and CEO of Quanticko Trading SA, a firm devoted to the development of high frequency trading strategies and trading software. Marco holds a BS in Economics and an MSc in Finance from the University of San Andrés.

### Introduction

One of my favorite classes during EPAT was the one on statistical arbitrage, so the pair trading strategy seemed a nice idea for me. My strategy triggers new orders when the pair ratio of the prices of the stocks diverge from the mean. But in order to work, we first have to test for the pair to be cointegrated. If the pair ratio is cointegrated, the ratio is mean-reverting and the greater the dispersion from its mean, the higher the probability of a reversal, which makes the trade more attractive. I chose the following pair of stocks:

• Bank of America (BAC)
• Citigroup (C)

The idea is the following: If we find two stocks that are correlated (they correspond to the same sector), and the pair ratio diverges from a certain threshold, we short the stock that is expensive and buy the one that is cheap. Once they converge to the mean, we close the positions and profit from the reversal.

## Authors

Maxime Fages
Maxime’s career spanned across the strategic aspects of value and risk, with a particular focus on trading behaviors and market microstructure over the past few years. He embraced a quantitative angle in M&A, fund management or currently corporate strategy and has always been an avid open-source software user. Maxime holds an MBA from Insead and an MSc, Engineering from Ecole Nationale Superieure D’Arts et Metiers; he is currently Strategy Director APAC at the CME Group.

Derek began his career on the floor of the CBOT then moved upstairs to focus on proprietary trading and strategy development. He manages global multi-strategy portfolios, focusing in the futures and options space. He is currently the Deputy Director of Systematic Trading at Foretrade Investment Co Ltd.

### Ideation

By the end of the Executive Programme in Algorithmic Trading (EPAT) lectures, Derek and I were spending a significant amount of time exchanging views over a variety of media. We discussed ideas for a project, and the same themes were getting us excited. First, we were interested in dealing with Futures rather than cash instruments. Second, we both had a solid experience using R for quantitative research and were interested in getting our hands dirty on the execution side of things, especially on the implementation of event-driven strategies in Python (which neither of us knew before the EPAT program). Third, we had spent hours discussing and assessing the performance of Machine Learning for trading applications and were pretty eager to try our ideas out. Finally, we were very interested in practical architecture design, particularly in what was the best way to manage the variable resource needs of any Machine Learning framework (training vs. evaluating).

The scope of our project, therefore, came about naturally: developing a fully cloud-based automated trading system that would leverage on simple, fast mean-reverting or trend-following execution algorithms and call on Machine learning technology to switch between these.

## Dispersion Strategy Based on Correlation of Stocks and Volatility of Index

### Introduction

This article examines profits from trading using the dispersion strategy based on the correlation of stocks, volatility of Index. Dispersion helps the trader take a view on volatility only (assuming that correlation mean reverts) and, therefore, it is made sure that delta risk is hedged by buying or selling futures. In this strategy, both long and short positions are built on volatility and with more strategies available nowadays it is better to use strategies which take advantage of relative values rather than absolutes. This limits the amount of money at risk in one direction. (more…)

## EPAT Final Project by Jacques Joubert – Statistical Arbitrage Strategy in R

Statistical Arbitrage Strategy in R – EPAT Project Work

This article is the final project submitted by the author as a part of his coursework in Executive Programme in Algorithmic Trading (EPAT) at QuantInsti. Do check our Projects page and have a look at what our students are building.

#### Background

For those of you who have been following my blog posts for the last 6 months will know that I have taken part in the Executive Programme in Algorithmic Trading offered by QuantInsti.

I uploaded everything to GitHub in order to welcome readers to contribute, improve, use, or work on this project. It will also form part of my Open Source Hedge Fund project on my blog QuantsPortal

I would like to say a special thank you to the team at QuantInsti. Thank you for all the revisions of my final project, for going out of your way to help me learn, and the very high level of client services.

#### History of Statistical Arbitrage

First developed and used in the mid-1980s by Nunzio Tartaglia’s quantitative group at Morgan Stanly.

• Pair Trading is a “contrarian strategy” designed to harness mean-reverting behavior of the pair ratio
• David Shaw, founder of D.E Shaw & Co, left Morgan Stanley and started his own “Quant” trading firm in the late 1980s dealing mainly in pair trading

Statistical arbitrage trading or pairs trading as it is commonly known is defined as trading one financial instrument or a basket of financial instruments – in most cases to create a value neutral basket.

It is the idea that a co-integrated pair is mean reverting in nature. There is a spread between the instruments and the further it deviates from its mean, the greater the probability of a reversal.

Note however that statistical arbitrage is not a risk free strategy. Say for example that you have entered positions for a pair and then the spread picks up a trend rather than mean reverting.

The Concept

Step 1: Find 2 related securities

Find two securities that are in the same sector / industry, they should have similar market capitalization and average volume traded.

An example of this is Anglo Gold and Harmony Gold.

In the code to follow I used the pair ratio to indicate the spread. It is simply the price of asset A / price asset B.

Step 3: Calculate the mean, standard deviation, and z-score of the pair ratio / spread.

Step 4: Test for co-integration

In the code to follow I use the Augmented Dicky Fuller Test (ADF Test) to test for co-integration. I set up three tests, each with a different number of observations (120, 90, 60), all three tests have to reject the null hypothesis that the pair is not co-integrated.

Trading signals are based on the z-score, given they pass the test for co-integration. In my project, I used a z-score of 1 as I noticed that other algorithms that I was competing with were using very low parameters. (I would have preferred a z-score of 2, as it better matches the literature, however, it is less profitable)

Step 6: Process transactions based on signals

Step 7: Reporting

### R markdown for my project

#### Import packages and set directory

The first step is always to import the packages needed.

#Imports
require(tseries)
require(urca) #Used for the ADF Test
require(PerformanceAnalytics)

This strategy will be run on shares listed on the Johannesburg Stock Exchange (JSE); because of this I won’t be using the quantmod package to pull data from yahoo finance, instead, I have already gotten and cleaned the data that I stored in a SQL database and moved to CSV files on the Desktop.

I added all the pairs used in the strategy to a folder which I now set to be the working directory.

##Change this to match where you stored the csv files folder name FullList
setwd("~/R/QuantInsti-Final-Project-Statistical-Arbitrage/database/FullList")

#### Functions that will be called from within other functions (No user interaction)

Next: Create all the functions that will be needed. The functions below will be called from within other functions so you don’t need to worry about the arguments.

The AddColumns function is used to add columns to the data frame that will be needed to store variables.

#Add Columns to csvDataframe
csvData$spread <- 0 csvData$adfTest <- 0
csvData$mean <- 0 csvData$stdev <- 0
csvData$zScore <- 0 csvData$signal <- 0
csvData$BuyPrice <- 0 csvData$SellPrice <- 0
csvData$LongReturn <- 0 csvData$ShortReturn <- 0
csvData$Slippage <- 0 csvData$TotalReturn <- 0
csvData$TransactionRatio <- 0 csvData$TradeClose <- 0
return(csvData)
}
##### PrepareData

The PrepareData function calculates the pair ratio and the log10 prices of the pair. It also calls the AddColumns function within it.

PrepareData <- function(csvData){
#Calculate the Pair Ratio
csvData$pairRatio <- csvData[,2] / csvData[,3] #Calculate the log prices of the two time series csvData$LogA <- log10(csvData[,2])
csvData$LogB <- log10(csvData[,3]) #Add columns to the DF csvData <- AddColumns(csvData) #Make sure that the date column is not read in as a vector of characters csvData$Date <- as.Date(csvData$Date) return(csvData) } ##### GenerateRowValue The GenerateRowValue function Calculates the mean, standard deviation and the z-score for a given row in the data frame. #Calculate mean, stdDev, and z-score for the given Row [end] GenerateRowValue <- function(begin, end, csvData){ average <- mean(csvData$spread[begin:end])
stdev <- sd(csvData$spread[begin:end]) csvData$mean[end] <-  average
csvData$stdev[end] <- stdev csvData$zScore[end] <- (csvData$spread[end]-average)/stdev return(csvData) } ##### GenerateSignal The GenerateSignal function creates a long, short, or close signal based on the z-score. You can manually change the z-score. I have set it to 1 and -1 for entry signals and any z-score between 0.5 and -0.5 will create a close/exit signal. GenerateSignal <- function(counter, csvData){ #Trigger and close represent the entry and exit zones (value refers to the z-score value) trigger <- 1 close <- 0.5 currentSignal <- csvData$signal[counter]
prevSignal <- csvData$signal[counter-1] #Set trading signal for the given [end] row if(csvData$adfTest[counter] == 1)
{
#If there is a change in signal from long to short then you must allow for the
#current trade to first be closed
if(currentSignal == -1 && prevSignal == 1)
csvData$signal[counter] <- 0 else if(currentSignal == 1 && prevSignal == -1) csvData$signal[counter] <- 0

#Create a long / short signal if the current z-score is larger / smaller than the trigger value
#(respectively)
else if(csvData$zScore[counter] > trigger) csvData$signal[counter] <- -1
else if (csvData$zScore[counter] < -trigger) csvData$signal[counter] <- 1

#Close the position if z-score is beteween the two "close" values
else if (csvData$zScore[counter] < close && csvData$zScore[counter] > -close)
csvData$signal[counter] <- 0 else csvData$signal[counter] <- prevSignal
}
else
csvData$signal[counter] <- 0 return(csvData) } ##### GenerateTransactions The GenerateTransactions function is responsible for setting the entry and exit prices for the respective long and short positions needed to create a pair. Note: QuantInsti taught us a very specific way of backtesting a trading strategy. They used excel to teach strategies and when I coded this strategy I used a large part of the excel methodology. Going forward, however, I would explore other ways of storing variables. One of the great things about this method is that you can pull the entire data frame and analyse why a trade was made and all the details pertaining to it. #Transactions based on trade signal #Following the framework set out initially by QuantInsti (Note: this can be coded better) GenerateTransactions <- function(currentSignal, prevSignal, end, csvData){ #In a pair trading strategy you need to go long one share and short the other #and then reverse the transaction when you close ##First Leg of the trade (Set Long position) #If there is no change in signal if(currentSignal == 0 && prevSignal == 0) { csvData$BuyPrice[end] <- 0
csvData$TransactionRatio[end]<-0 } else if(currentSignal == prevSignal) { csvData$BuyPrice[end] <- csvData$BuyPrice[end-1] csvData$TransactionRatio[end]<-csvData$TransactionRatio[end-1] } #If the signals point to a new trade #Short B and Long A else if(currentSignal == 1 && currentSignal != prevSignal) csvData$BuyPrice[end] <- csvData[end, 2]
#Short A and Long B
else if(currentSignal == -1 && currentSignal != prevSignal){
csvData$BuyPrice[end] <- csvData[end, 3] * csvData$pairRatio[end]
transactionPairRatio <<- csvData$pairRatio[end] csvData$TransactionRatio[end]<- transactionPairRatio
}

else if(currentSignal == 0 && prevSignal == 1)
csvData$BuyPrice[end] <- csvData[end, 2] else if(currentSignal == 0 && prevSignal == -1) {csvData$TransactionRatio[end] = csvData$TransactionRatio[end-1] csvData$BuyPrice[end] <- csvData[end, 3] * csvData$TransactionRatio[end] } ##Second Leg of the trade (Set Short position) ##Set Short Prices if there is no change in signal if(currentSignal == 0 && prevSignal == 0) csvData$SellPrice[end] <- 0
else if(currentSignal == prevSignal)
csvData$SellPrice[end] <- csvData$SellPrice[end-1]

#If the signals point to a new trade
else if(currentSignal == 1 && currentSignal != prevSignal){
csvData$SellPrice[end] <- csvData[end, 3] * csvData$pairRatio[end]
transactionPairRatio <<- csvData$pairRatio[end] csvData$TransactionRatio[end]<- transactionPairRatio
}
else if(currentSignal == -1 && currentSignal != prevSignal)
csvData$SellPrice[end] <- csvData[end, 2] #Close trades else if(currentSignal == 0 && prevSignal == 1){ csvData$TransactionRatio[end] = csvData$TransactionRatio[end-1] csvData$SellPrice[end] <- csvData[end, 3] * csvData$TransactionRatio[end] } else if(currentSignal == 0 && prevSignal == -1) csvData$SellPrice[end] <- csvData[end, 2]

return(csvData)
}
##### GetReturnsDaily

GetReturnsDaily calculates the daily returns on each position and then calculates the total returns and adds slippage.

#Calculate daily returns generated
GetReturnsDaily <- function(end, csvData, slippage){
#Calculate the returns generated on each leg of the deal (the long and the short position)
if(csvData$signal[end-1]>0){csvData$LongReturn[end] <- log(csvData[end,2] / csvData[end-1,2])}
else
if(csvData$signal[end-1]<0){csvData$LongReturn[end] <- log(csvData[end,3] / csvData[end-1,3])*csvData$TransactionRatio[end]} #Short Leg of the trade if(csvData$signal[end-1]>0){csvData$ShortReturn[end] <- -log(csvData[end,3] / csvData[end-1,3])*csvData$TransactionRatio[end]}
else
if(csvData$signal[end-1]<0){csvData$ShortReturn[end] <- -log(csvData[end,2] / csvData[end-1,2])}

if(csvData$signal[end] == 0 && csvData$signal[end-1] != 0)
{
csvData$Slippage[end] <- slippage csvData$TradeClose[end] <-1
}
#If a trade was closed then calculate the total return
csvData$TotalReturn[end] <- ((csvData$ShortReturn[end] + csvData$LongReturn[end]) / 2) + csvData$Slippage[end]

return(csvData)
}
##### GenerateReports

The next two arguments are used to generate reports. A report includes the following: Charting: 1. An Equity curve 2. Drawdown curve 3. Daily returns bar chart

Statistics: 1. Annual Returns 2. Annualized Sharpe Ratio 3. Maximum Drawdown

Table: 1. Top 5 drawdowns and their duration

Note: If you have some extra time then you can further break this function down into smaller functions inorder to reduce the lines of code and improve usability. Less code = Less Bugs

#Returns an equity curve, annualized return, annualized sharpe ratio, and max drawdown
GenerateReport <- function(pairData, startDate, endDate){
#Subset the dates
returns <- xts(pairData$TotalReturn, as.Date(pairData$Date))
returns <- returns[paste(startDate,endDate,sep="::")]
#Plot
charts.PerformanceSummary(returns)

#Metrics
print(paste("Annual Returns: ",Return.annualized(returns)))
print(paste("Annualized Sharpe: " ,SharpeRatio.annualized(returns)))
print(paste("Max Drawdown: ",maxDrawdown(returns)))

pairDataSub= pairData[pairData$TradeClose==1,] returns_sub <- xts(pairDataSub$TotalReturn, as.Date(pairDataSub$Date)) returns_sub <- returns_sub[paste(startDate,endDate,sep="::")] #var returns = xts object totalTrades <- 0 positiveTrades <- 0 profitsVector <- c() lossesVector <- c() #loop through the data to find the + & - trades and total trades for(i in returns_sub){ if(i != 0){ totalTrades <- totalTrades + 1 if(i > 0){ positiveTrades <- positiveTrades + 1 profitsVector <- c(profitsVector, i) } else if (i < 0){ lossesVector <- c(lossesVector, i) } } } #Print the results to the console print(paste("Total Trades: ", totalTrades)) print(paste("Success Rate: ", positiveTrades/totalTrades)) print(paste("PnL Ratio: ", mean(profitsVector)/mean(lossesVector*-1))) print(table.Drawdowns(returns)) } GenerateReport.xts <- function(returns, startDate = '2005-01-01', endDate = '2015-11-23'){ #Metrics returns <- returns[paste(startDate,endDate,sep="::")] charts.PerformanceSummary(returns) print(paste("Annual Returns: ",Return.annualized(returns))) print(paste("Annualized Sharpe: " ,SharpeRatio.annualized(returns))) print(paste("Max Drawdown: ",maxDrawdown(returns))) print(table.Drawdowns(returns)) } #### Functions that the user will pass parameters to The next two functions are the only functions that the user should fiddle with. ##### BacktestPair BacktestPair is used when you want to run a backtest on a trading pair (the pair is passed in via the CSV file) Functions arguments: • pairData = the CSV file date • mean = the number of observations used to calculate the mean of the spread. • slippage = the amount of basis points that act as brokerage as well as slippage • adfTest = a boolean value - if the backtest should test for co-integration • criticalValue = Critical Value used in the ADF Test to test for co-integration • generateReport = a boolean value - if a report must be generated #The function that will be called by the user to backtest a pair BacktestPair <- function(pairData, mean = 35, slippage = -0.0025, adfTest = TRUE, criticalValue = -2.58, startDate = '2005-01-01', endDate = '2014-11-23', generateReport = TRUE){ # At 150 data points # Critical value at 1% : -3.46 # Critical value at 5% : -2.88 # Critical value at 10% : -2.57 #Prepare the initial dataframe by adding columns and pre calculations pairData <- PrepareData(pairData) #Itterate through each day in the time series for(i in 1:length(pairData[,2])){ #For each day after the amount of days needed to run the ADF test if(i > 130){ begin <- i - mean + 1 end <- i #Calculate Spread spread <- pairData$pairRatio[end]
pairData$spread[end] <- spread #ADF Test #120 - 90 - 60 if(adfTest == FALSE){ pairData$adfTest[end] <- 1
}
else {
if(adf.test(pairData$spread[(i-120):end], k = 1)[1] <= criticalValue){ if(adf.test(pairData$spread[(i-90):end], k = 1)[1] <= criticalValue){
if(adf.test(pairData$spread[(i-60):end], k = 1)[1] <= criticalValue){ #If co-integrated then set the ADFTest value to true / 1 pairData$adfTest[end] <- 1
}
}
}
}
#Calculate the remainder variables needed
if(i >= mean){
#Generate Row values
pairData <- GenerateRowValue(begin, end, pairData)
#Generate the Signals
pairData <- GenerateSignal(i, pairData)

currentSignal <- pairData$signal[i] prevSignal <- pairData$signal[i-1]

#Generate Transactions
pairData <- GenerateTransactions(currentSignal, prevSignal, i, pairData)

#Get the returns with added slippage
pairData <- GetReturnsDaily(i, pairData, slippage)

}
}
}

if(generateReport == TRUE)
GenerateReport(pairData, startDate, endDate)

return(pairData)
}
##### BacktestPortfolio

BacktestPortfolio accepts a vector of CSV files and then generates an equally weighted portfolio.

Functions arguments:

• names = an attomic vector of CSV file names, example: c('DsyLib.csv', 'OldSanlam.csv')
• mean = the number of observations used to calculate the mean of the spread.
• leverage = how much leverage you want to apply to the portfolio
#An equally weighted portfolio of shares
BacktestPortfolio <- function(names, mean = 35,leverage = 1, startDate = '2005-01-01', endDate = '2015-11-23'){
##Itterates through all the pairs and backtests each one
##stores the data in a list of numerical vectors
returns.list <- list()
counter <- F
ticker <- 1
for (name in names){
#A notification to let you know how far it is
print(paste(ticker, " of ", length(names)))
ticker <- ticker + 1

#Run the backtest on the pair
BackTest.df <- BacktestPair(data, mean, generateReport = FALSE)

#Store the dates in a seperate vector
if (counter == F){
dates <<- as.Date(BackTest.df$Date) counter <- T } #Append to list returns.list <- c(returns.list, list(BackTest.df[,18])) } ##Aggregates the returns for each day and then calculates the average for each day total.returns <- c() for (i in 1:length(returns.list)){ if(i == 1) total.returns = returns.list[[i]] else total.returns = total.returns + returns.list[[i]] } total.returns <- total.returns / length(returns.list) ##Generate a report for the portfolio returns <- xts(total.returns * leverage, dates) GenerateReport.xts(returns, startDate, endDate) return(returns) } ### Running Backtests Now we can start testing strategies using our code. #### Pure arbitrage on the JSE When starting this project the main focus was on using statistical arbitrage to find pairs that were co-integrated and then to trade those, however, I very quickly realized that the same code could be used to trade shares that had both its primary listing as well as access to its secondary listing on the same exchange. If both listings are found on the same exchange, it opens the door for a pure arbitrage strategy due to both listings referring to the same asset. Therefore you don't need to test for co-integration. There are two very obvious examples on the JSE. ##### First Example Investec: Primary = Investec Ltd : Secondary = Investec PLC ###### Investec In-Sample Test (2005-01-01 - 2012-11-23) Test the following parameters • The Investec ltd / plc pair • mean = 35 • Set adfTest = F (Dont test for co-integration) • Leverage of x3 #Investec leverage <- 3 data <- read.csv('Investec.csv') investec <- BacktestPair(data, 35, generateReport = F, adfTest = F) #Format to an xts object and pass to GenerateReport.xts() investec.returns <- xts(investec[,18] * leverage, investec$Date)
GenerateReport.xts(investec.returns, startDate = '2005-01-01', endDate = '2012-11-23')

## [1] "Annual Returns: 0.619853087807437"
## [1] "Annualized Sharpe: 3.29778431709924"
## [1] "Max Drawdown: 0.105016628973292"
## From Trough To Depth Length To Trough Recovery
## 1 2009-03-19 2009-03-25 2009-05-04 -0.1050 28 5 23
## 2 2006-06-08 2006-07-13 2006-08-14 -0.0955 46 25 21
## 3 2008-10-03 2008-10-17 2008-10-24 -0.0887 16 11 5
## 4 2009-03-02 2009-03-02 2009-03-06 -0.0733 5 1 4
## 5 2008-10-27 2008-10-27 2008-11-05 -0.0697 8 1 7
###### Investec Out-of-Sample Test (2012-11-23 - 2015-11-23)

Note: if you increase the slippage, you will very quickly kiss profits goodbye.

GenerateReport.xts(investec.returns, startDate = '2012-11-23', endDate = '2015-11-23')

## [1] "Annual Returns: 0.1754103210963"
## [1] "Annualized Sharpe: 2.20385429706265"
## [1] "Max Drawdown: 0.0335642102186873"
## From Trough To Depth Length To Trough Recovery
## 1 2015-07-10 2015-11-13  -0.0336 96 89 NA
## 2 2013-06-18 2013-06-21 2013-07-01 -0.0267 10 4 6
## 3 2014-04-16 2014-08-13 2014-09-19 -0.0262 107 80 27
## 4 2015-01-20 2015-05-25 2015-06-01 -0.0258 91 86 5
## 5 2013-01-18 2013-01-24 2013-01-25 -0.0249 6 5 1
##### Second Example Mondi:

Primary = Mondi Ltd : Secondary = Mondi PLC

###### Mondi In-Sample Test (2008-01-01 - 2012-11-23)

Test the following parameters

• The Mondi ltd / plc pair
• mean = 35
• Set adfTest = F (Dont test for co-integration)
• Leverage of x3

data <- read.csv('mondi.csv') mondi <- BacktestPair(data, 35, generateReport = F, adfTest = F)

mondi.returns <- xts(mondi[,18] * leverage, mondi\$Date)
GenerateReport.xts(mondi.returns, startDate = '2008-01-01', endDate = '2012-11-23')

## [1] "Annual Returns: 0.973552250431717"
## [1] "Annualized Sharpe: 2.88672185296756"
## [1] "Max Drawdown: 0.254688711989788"
## From Trough To Depth Length To Trough Recovery
## 1 2008-07-01 2008-08-01 2008-09-01 -0.2547 45 24 21
## 2 2009-03-11 2009-03-18 2009-04-08 -0.1906 21 6 15
## 3 2008-04-16 2008-06-03 2008-06-23 -0.1040 45 32 13
## 4 2008-09-02 2008-09-17 2008-09-18 -0.0926 13 12 1
## 5 2009-03-09 2009-03-09 2009-03-10 -0.0864 2 1 1
###### Mondi Out-of-Sample Test (2012-11-23 - 2015-11-23)

Note: In all of my testing I found that the further down the timeline my data was, the harder it was to make profits on the end of day data. I tested this same strategy on intraday data and it has a higher return profile.

GenerateReport.xts(mondi.returns, startDate = '2012-11-23', endDate = '2015-11-23')

## [1] "Annual Returns: 0.0809094579019469"
## [1] "Annualized Sharpe: 1.25785312960412"
## [1] "Max Drawdown: 0.0385234269750542"
## From Trough To Depth Length To Trough Recovery
## 1 2013-12-19 2014-10-13 2015-01-26 -0.0385 273 202 71
## 2 2015-06-05 2015-08-14  -0.0313 120 49 NA
## 3 2015-01-27 2015-04-22 2015-04-28 -0.0245 63 60 3
## 4 2013-05-29 2013-05-30 2013-06-14 -0.0179 13 2 11
## 5 2013-11-08 2013-11-18 2013-12-18 -0.0175 28 7 21

### Statistical Arbitrage on the JSE

Next, we will look at a pair trading strategy.

Typically a pair consists of 2 shares that:

• Share a market sector
• Have a similar market cap
• Similar business model and clients
• Are co-integrated

In all of the portfolios below I use 3x leverage

#### Construction Portfolio

##### In-sample test (2005-01-01 - 2012-11-01)
names <- c('GroupMR.csv', 'GroupPPC.csv', 'GroupAVENGE.csv', 'GroupWHBO.csv',
'mrppc.csv', 'mravenge.csv')

ReturnSeries <- BacktestPortfolio(names, startDate = '2005-01-01', endDate = '2012-11-01', leverage = 3)

## [1] "1 of 6"
## [1] "2 of 6"
## [1] "3 of 6"
## [1] "4 of 6"
## [1] "5 of 6"
## [1] "6 of 6"

[1] "Annual Returns: 0.0848959306632411"
## [1] "Annualized Sharpe: 0.733688101181479"
## [1] "Max Drawdown: 0.193914686702112"
## From Trough To Depth Length To Trough Recovery
## 1 2008-05-19 2008-07-08 2008-11-03 -0.1939 119 36 83
## 2 2008-11-04 2008-12-03 2009-06-29 -0.1345 160 22 138
## 3 2006-08-25 2007-12-19 2008-02-19 -0.1272 372 331 41
## 4 2009-08-04 2009-10-01 2009-11-10 -0.0701 69 41 28
## 5 2009-11-25 2010-03-10 2010-09-29 -0.0486 211 73 138
##### Out-of-sample test (2012-11-23 - 2015-11-23)
GenerateReport.xts(ReturnSeries, startDate = '2012-11-23', endDate = '2015-11-23')

## [1] "Annual Returns: 0.0159094762396512"
## [1] "Annualized Sharpe: 0.268766025866724"
## [1] "Max Drawdown: 0.0741426720423424"
## From Trough To Depth Length To Trough Recovery
## 1 2013-08-05 2013-09-06 2014-11-17 -0.0741 322 24 298
## 2 2014-11-20 2015-01-29  -0.0737 253 47 NA
## 3 2012-11-30 2013-04-23 2013-05-02 -0.0129 102 96 6
## 4 2013-06-10 2013-06-13 2013-06-24 -0.0100 10 4 6
## 5 2013-05-03 2013-05-03 2013-06-04 -0.0050 23 1 22

#### Insurance Portfolio

##### In-sample test (2005-01-01 - 2012-11-01)
names <- c('DiscLib.csv', 'DiscMMI.csv', 'DiscSanlam.csv', 'LibMMI.csv', 'MMIOld.csv',
'MMISanlam.csv', 'OldSanlam.csv')

ReturnSeries <- BacktestPortfolio(names, startDate = '2005-01-01', endDate = '2012-11-01', leverage = 3)

[1] "1 of 7"
## [1] "2 of 7"
## [1] "3 of 7"
## [1] "4 of 7"
## [1] "5 of 7"
## [1] "6 of 7"
## [1] "7 of 7"

## [1] "Annual Returns: 0.110600985165525"
## [1] "Annualized Sharpe: 0.791920916349154"
## [1] "Max Drawdown: 0.233251846760865"
## From Trough To Depth Length To Trough Recovery
## 1 2005-05-26 2005-10-14 2006-08-31 -0.2333 318 100 218
## 2 2008-10-15 2008-12-05 2009-04-30 -0.1513 134 38 96
## 3 2009-06-10 2009-12-10 2010-01-29 -0.1223 162 129 33
## 4 2011-10-04 2012-10-09  -0.0991 267 249 NA
## 5 2006-11-08 2007-12-11 2007-12-14 -0.0894 277 274 3
##### Out-of-sample test (2012-11-23 - 2015-11-23)
GenerateReport.xts(ReturnSeries, startDate = '2012-11-23', endDate = '2015-11-23')

## [1] "Annual Returns: -0.0265926093350092"
## [1] "Annualized Sharpe: -0.319582293135835"
## [1] "Max Drawdown: 0.128061204573991"
## From Trough To Depth Length To Trough Recovery
## 1 2014-08-08 2015-11-20  -0.1281 326 324 NA
## 2 2012-11-28 2013-05-13 2013-07-31 -0.0393 167 111 56
## 3 2014-06-10 2014-06-26 2014-07-23 -0.0284 31 12 19
## 4 2013-08-01 2013-08-30 2013-09-03 -0.0255 23 21 2
## 5 2013-09-11 2013-10-22 2013-12-04 -0.0209 60 29 31

#### General Retail Portfolio

##### In-sample test (2005-01-01 - 2012-11-01)
names <- c('Wooltru.csv', 'WoolMr.csv', 'WoolTFG.csv', 'TRUMR.csv', 'TruTFG.csv', 'MRTFG.csv')

ReturnSeries <- BacktestPortfolio(names, startDate = '2005-01-01', endDate = '2012-11-01', leverage = 3)

[1] "1 of 6"
## [1] "2 of 6"
## [1] "3 of 6"
## [1] "4 of 6"
## [1] "5 of 6"
## [1] "6 of 6"

## [1] "Annual Returns: 0.120956981644048"
## [1] "Annualized Sharpe: 1.4694780839876"
## [1] "Max Drawdown: 0.125406256082082"
## From Trough To Depth Length To Trough Recovery
## 1 2010-01-05 2012-01-17  -0.1254 705 504 NA
## 2 2008-09-29 2008-10-29 2009-02-20 -0.0690 101 23 78
## 3 2006-03-06 2006-05-15 2006-05-23 -0.0568 52 46 6
## 4 2005-07-18 2005-11-01 2005-12-06 -0.0538 101 76 25
## 5 2008-04-11 2008-04-29 2008-06-26 -0.0512 51 12 39
##### Out-of-sample test (2012-11-23 - 2015-11-23)
GenerateReport.xts(ReturnSeries, startDate = '2012-11-23', endDate = '2015-11-23')

[1] "Annual Returns: -0.0171898953593881"
## [1] "Annualized Sharpe: -0.336265418351652"
## [1] "Max Drawdown: 0.0884145115767888"
## From Trough To Depth Length To Trough Recovery
## 1 2013-10-15 2015-11-11  -0.0884 528 519 NA
## 2 2013-03-18 2013-06-24 2013-08-12 -0.0279 100 66 34
## 3 2013-09-05 2013-09-06 2013-09-20 -0.0088 12 2 10
## 4 2013-09-23 2013-10-02 2013-10-08 -0.0049 11 7 4
## 5 2013-02-20 2013-02-20 2013-03-15 -0.0037 18 1 17

#### Conclusion:

At the end of all my testing, and trust me – there is a lot more testing I did than what is in this report, I came to the conclusion that the Pure Arbitrage Strategy has great hope in being used as a strategy using real money, but the Pair Trading Strategy on portfolios of stocks in a given sector is strained and not likely to be used in production in its current form.

There are many things that I think could be added to improve the performance. Going forward I will investigate using Kalman filters.

##### More on the Pure Arbitrage Trading Strategy:

I have only found two shares that have duel listings on the same exchange; this means that we can’t allocate large sums of money to the strategy as it will have a high market impact, however, we could use multiple exchanges and increase the number of shares used.

##### More on the Pair Trading Strategy:
1. The number of observations used in the ADF Tests is large to blame. The problem is that a test for co-integration has to be done in order to make a claim for statistical arbitrage, however by using 120, 90, and 60 as parameters to the three tests, it is very difficult to find pairs that match the criteria and that will continue in this form for the near future. (Kalman filtering may be useful here)
2. I haven’t spent a lot of time changing the different parameters like the number of observations in the mean calculation. (This requires further exploration)
3. From the above sector portfolios, we can see that the early years are very profitable but the further down the timeline we go, the lower returns get. I have spoken to a few people in the industry as well as my friends doing stat arb projects at the University of Cape Town, the local lore has it that in 2009 Goldman switched on their stat arb package, in regards to the JSE listed securities.
4. The same is noticed with other portfolios that I didn’t include in this report but is in the R Code file.
5. I believe that this is due to large institutions using the same bread and butter strategy. You will note (if you spend enough time testing all the strategies) that in 2009 there seems to be a sudden shift in the data to lower returns.
6. I feel that the end of day data I am using is limiting me and if I were to test the strategy on intraday data then profits would be higher. (I ran one test on intraday data on Mondi and the results were much higher, but I am still to test it on sector portfolios)
7. This is one of the simpler statistical arbitrage strategies and I believe that if we were to improve the way we calculate the spread and change some of the entry and exit rules, the strategy would become more profitable.

If you made it to the end of this article, I thank you and hope that it added some value. This is the first time that I am using Github, so I am looking forward to seeing if there are any new contributors to the project.