Pair Trading – Statistical Arbitrage On Cash Stocks

This article is the final project submitted by the author as a part of his coursework in Executive Programme in Algorithmic Trading (EPAT™) at QuantInsti™. Do check our Projects page and have a look at what our students are building.

About the Author

Jonathan has a strong knowledge of mathematical programming and has worked as a process optimization engineer for 3 years. He started to get involved in trading as a hobby, especially in algorithmic trading due to his passion for math but eventually, it became his full-time job. Jonathan enrolled for Executive Programme in Algorithmic Trading (EPAT™) in November 2016 and found his space in the world on quantitative analysis in finance. Currently, he is taking several courses online in subjects related to Artificial Intelligence and its applications in finance and is about to start an online portal in Financial Engineering to share his experience as a Quant Trader.

 

Project Objective

The objective of this project is to model a statistical arbitrage trading strategy and quantitatively analyze the modeling results. Motivation relies on diversifying investment throughout five sectors, aka Technology, Financial, Services, Consumer Goods and Industrial Goods. Furthermore, some stocks, generally in the same sector, move in tandem because prices are affected by the same market events. However, the noise might make them temporarily deviate from the usual pattern and a trader can take advantage of this apparent deviation with the expectation that the stocks will eventually return to their long-term relationship.

Within each sector, stocks were selected based on high liquidity, small bid/ask spread and ability to short the stock. However, it is possible to consider other stocks for further analysis. Once the stock universe is defined, pairs can be formed. Every day as we want to enter a position, all the pairs in the universe are evaluated and the top pairs are selected per some criteria.

Trading Strategy Idea

As the universe of pairs is already defined, correlation analysis should be performed for all possible pairs to filter out pairs which have suitable properties for executing statistical arbitrage. With this correlation test, we are looking for a measurement of the relationship between two stock prices. The logic of the strategy is: for any pair that is correlated (from the universe established), if the pair ratio diverges from a certain threshold, then we short the stock that is expensive and buy the cheap stock. Once they converge to the mean, we close the position and profit from the reversal.

The strategy triggers new orders whenever the pair ratio of the prices of the stocks on the universe of filtered pairs diverges from the mean. To ensure the convenience of trading at this point, the pair must be cointegrated. If the pair ratio is cointegrated, the ratio is mean reverting and the greater the dispersion from its mean, the higher the probability of a reversal, which makes the trade more attractive. This analysis allows in determining the stability of the long-term relationship. Spread time series is tested for stationarity by the Augmented Dickey-Fuller (ADF) test. In other words, if pair stocks are cointegrated, it suggests that the mean and variance of this correlation remains constant over time. There is, however, a major issue which makes this simple strategy difficult to implement in practice: long term relationship can break down, and the spread can move from one equilibrium to another.

A training period of minimum 1-year data is chosen for out-of-sample test and the capital allocated to each sector is decided based on a minimum variance portfolio approach. Each sector is traded independently. Yahoo finance has been used for testing this strategy.  To perform the backtesting for each pair, data for the period 1-Jan-2009 to 31-Dec-2014 has been used.

Strategy Details

You can read the complete project work of the author including the Python codes for Pairs Trading by downloading the Ebook provided below.

Highlights from the project include:

  • Pair Trading – Statistical Arbitrage on Cash Stocks
  • Strategy
  • Code Details and In-Sample Backtesting
  • Analyzing Model Output
  • Monte Carlo Analysis and much more…

Next Step

If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to build a promising career in algorithmic trading. Enroll now!

Read more

R Weekly Bulletin Vol – XII

This week’s R bulletin will cover topics on how to resolve some common errors in R.

We will also cover functions like do.call, rename, and lapply. Click To TweetHope you like this R weekly bulletin. Enjoy reading!

Shortcut Keys

1. Find and Replace – Ctrl+F
2. Find Next – F3
3. Find Previous – Shift+F3

Problem Solving Ideas

Resolving the ‘: cannot open the connection’ Error

There can be two reasons for this error to show up when we run an R script: 1) A file/connection can’t be opened because R can’t find it (mostly due to an error in the path) 2) Failure in .onLoad() because a package can’t find a system dependency

Example:

symbol = "AXISBANK"
noDays = 1
dirPath = paste(getwd(), "/", noDays, " Year Historical Data", sep = "")
fileName = paste(dirPath, symbol, ".csv", sep = "")
data = as.data.frame(read.csv(fileName))

Warning in file(file, “rt”): cannot open file ‘C:/Users/Madhukar/Documents/
1 Year Historical DataAXISBANK.csv’: No such file or directory
Error in file(file, “rt”): cannot open the connection

We are getting this error because we have specified the wrong path to the “dirPath” object in the code. The right path is shown below. We missed adding a forward slash after “Year Historical Data” in the paste function. This led to the wrong path, and hence the error.

dirPath = paste(getwd(),”/”,noDays,” Year Historical Data/”,sep=””)

After adding the forward slash, we re-ran the code. Below we can see the right dirPath and fileName printed in the R console.

Example:

symbol = "AXISBANK"
noDays = 1
dirPath = paste(getwd(), "/", noDays, " Year Historical Data/", sep = "")
fileName = paste(dirPath, symbol, ".csv", sep = "")
data = as.data.frame(read.csv(fileName))
print(head(data, 3))

Resolving the ‘could not find function’ Error

This error arises when an R package is not loaded properly or due to the misspelling of the function names.

When we run the code shown below, we get a “could not find the function ymd” error in the console. This is because we have misspelled the “ymd” function as “ymed”. If we do not load the required packages, this will also throw up a “could not find function ymd” error.

Example:

# Read NIFTY price data from the csv file
df = read.csv("NIFTY.csv")

# Format date
dates = ymed(df$DATE)

Error in eval(expr, envir, enclos): could not find function “ymed”

Resolving the “replacement has” Error

This error occurs when one tries to assign a vector of values to an existing object and the lengths do not match up.

In the example below, the stock price data of Axis bank has 245 rows. In the code, we created a sequence “s” of numbers from 1 to 150. When we try to add this sequence to the Axis Bank data set, it throws up a “replacement error” as the lengths of the two do not match. Thus to resolve such errors one should ensure that the lengths match.

Example:

symbol = "AXISBANK" ; noDays = 1 ;
dirPath = paste(getwd(),"/",noDays," Year Historical Data/",sep="")
fileName = paste(dirPath,symbol,".csv",sep="")
df = as.data.frame(read.csv(fileName))

# Number of rows in the dataframe "df"
n = nrow(df); print(n);

# create a sequence of numbers from 1 to 150
s = seq(1,150,1)

# Add a new column "X" to the existing data frame "df"
df$X = s
print(head(df,3))

Error in $<-.data.frame(*tmp*, “X”, value = c(1, 2, 3, 4, 5, 6, 7, : replacement has 150 rows, data has 245

Functions Demystified

do.call function

The do.call function is used for calling other functions. The function which is to be called is provided as the first argument to the do.call function, while the second argument of the do.call function is a list of arguments of the function to be called. The syntax for the function is given as:

do.call (function_name, arguments)

Example: Let us first define a simple function that we will call later in the do.call function.

numbers = function(x, y) {
sqrt(x^3 + y^3)
}

# Now let us call this 'numbers' function using the do.call function. We provide the function name as # the first argument to the do.call function, and a list of the arguments as the second argument.

do.call(numbers, list(x = 3, y = 2))
[1] 5.91608

rename function

The rename function is part of the dplyr package, and is used to rename the columns of a data frame. The syntax for the rename function is to have the new name on the left-hand side of the = sign, and the old name on the right-hand side. Consider the data frame “df” given in the example below.

Example:

library(dplyr)
Tic = c("IOC", "BPCL", "HINDPETRO", "ABAN")
OP = c(555, 570, 1242, 210)
CP = c(558, 579, 1248, 213)
df = data.frame(Tic, OP, CP)
print(df)

# Renaming the columns as 'Ticker', 'OpenPrice', and 'ClosePrice'. This can be done in the following 
# manner:

renamed_df = rename(df, Ticker = Tic, OpenPrice = OP, ClosePrice = CP)
print(renamed_df)

lapply function

The lapply function is part of the R base package, and it takes a list “x” as an input, and returns a list of the same length as “x”, each element of which is the result of applying a function to the corresponding element of X. The syntax of the function is given as:

lapply(x, Fun)
where,
x is a vector (atomic or list)
Fun is the function to be applied

Example 1:

Let us create a list with 2 elements, OpenPrice and the ClosePrice. We will compute the mean of the values in each element using the lapply function.

x = list(OpenPrice = c(520, 521.35, 521.45), ClosePrice = c(521, 521.1, 522))
lapply(x, mean)

$OpenPrice
[1] 520.9333

$ClosePrice
[1] 521.3667

Example 2:

x = list(a = 1:10, b = 11:15, c = 1:50)
lapply(x, FUN = length)

$a
[1] 10

$b
[1] 5

$c
[1] 50

Next Step

We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.

Read more

Trading Using Machine Learning In Python Part-2

Trading using Machine Learning in Python Part-2

By Varun Divakar

Continued:

At the end of my last blog, I had asked a few questions. Now, I will answer them all at the same time. I will also discuss a way to detect the regime/trend in the market without training the algorithm for trends. But before we go ahead, please use a fix to fetch the data from Google to run the code below.

data from Google to run the code

Trading Using Machine Learning In Python Part-2Click To Tweet

Answers:

Is the equation over-fitting?

This was the first question I had asked. To know if your data is overfitting or not, the best way to test it would be to check the prediction error that the algorithm makes in the train and test data.

(more…)

Read more

Machine Learning For Trading – How To Predict Stock Prices Using Regression?

Machine Learning in Trading. How to Predict Accurate Stock Prices using Regression

By Sushant Ratnaparkhi

The other day I was reading an article on how AI has progressed so far and where it is going. I was awestruck and had a hard time digesting the picture the author drew on possibilities in the future.

Here is how I reacted. (No, I am not as good looking as Joey but you get the idea)

And here is one of the possibilities where AI could be applied in medical field, para from the article,

A surgeon could control a machine scalpel with her motor cortex instead of holding one in her hand, and she could receive sensory input from that scalpel so that it would feel like an 11th finger to her. So it would be as if one of her fingers was a scalpel and she could do the surgery without holding any tools, giving her much finer control over her incisions. An inexperienced surgeon performing a tough operation could bring a couple of her mentors into the scene as she operates to watch her work through her eyes and think instructions or advice to her. And if something goes really wrong, one of them could “take the wheel” and connect their motor cortex to her outputs to take control of her hands.

You can read the article here.

At this moment, AI and Machine Learning have already progressed enough and they can predict stock prices with a great level of accuracy. Let me show you how.

Machine Learning in Trading – How to Predict Stock Prices using Regression?Click To Tweet

What is Machine Learning?

The definition is this, “Machine Learning is where computer algorithms are used to autonomously learn from data and information and improve the existing algorithms”

(more…)

Read more

R Weekly Bulletin Vol – XI

This week’s R bulletin will cover topics on how to round to the nearest desired number, converting and comparing dates and how to remove last x characters from an element.

We will also cover functions like rank, mutate, transmute, and set.seed. Click To TweetHope you like this R weekly bulletin. Enjoy reading!

Shortcut Keys

1. Comment/uncomment current line/selection – Ctrl+Shift+C
2. Move Lines Up/Down – Alt+Up/Down
3. Delete Line – Ctrl+D

Problem Solving Ideas

Rounding to the nearest desired number

Consider a case where you want to round a given number to the nearest 25. This can be done in the following manner:

round(145/25) * 25
[1] 150

floor(145/25) * 25
[1] 125

ceiling(145/25) * 25
[1] 150

Usage:
Assume if you are calculating a stop loss or take profit for an NSE stock in which the minimum tick is 5 paisa. In such case, we will divide and multiply by 0.05 to achieve the desired outcome.

Example:

Price = 566
Stop_loss = 1/100

# without rounding
SL = Price * Stop_loss
print(SL)
[1] 5.66

# with rounding to the nearest 0.05
SL1 = floor((Price * Stop_loss)/0.05) * 0.05
print(SL1)
[1] 5.65

How to remove last n characters from every element

To remove the last n characters we will use the substr function along with the nchr function. The example below illustrates the way to do it.

Example:

# In this case, we just want to retain the ticker name which is "TECHM"
symbol = "TECHM.EQ-NSE"
s = substr(symbol,1,nchar(symbol)-7)
print(s)
[1] “TECHM”

Converting and Comparing dates in different formats

When we pull stock data from Google finance the date appears as “YYYYMMDD”, which is not recognized as a date-time object. To convert it into a date-time object we can use the “ymd” function from the lubridate package.

Example:

library(lubridate)
x = ymd(20160724)
print(x)
[1] “2016-07-24”

Another data provider gives stock data which has the date-time object in the American format (mm/dd/yyyy). When we read the file, the date-time column is read as a character. We need to convert this into a date-time object. We can convert it using the as.Date function and by specifying the format.

dt = "07/24/2016"
y = as.Date(dt, format = "%m/%d/%Y")
print(y)
[1] “2016-07-24”

# Comparing the two date-time objects (from Google Finance and the data provider) after conversion
identical(x, y)
[1] TRUE

Functions Demystified

rank function

The rank function returns the sample ranks of the values in a vector. Ties (i.e., equal values) and
missing values can be handled in several ways.

rank(x, na.last = TRUE, ties.method = c(“average”, “first”, “random”, “max”, “min”))

where,
x: numeric, complex, character or logical vector
na.last: for controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed; if “keep” they are kept with rank NA
ties.method: a character string specifying how ties are treated

Examples:

x <- c(3, 5, 1, -4, NA, Inf, 90, 43)
rank(x)
[1] 3 4 2 1 8 7 6 5

rank(x, na.last = FALSE)
[1] 4 5 3 2 1 8 7 6

mutate and transmute functions

The mutate and transmute functions are part of the dplyr package. The mutate function computes new variables using the existing variables of a given data frame. The new variables are added to the existing data frame. On the other hand, the transmute function creates these new variables as a separate data frame.

Consider the data frame “df” given in the example below. Suppose we have 5 observations of 1-minute price data for a stock, and we want to create a new variable by subtracting the mean from the 1-minute closing prices. It can be done in the following manner using the mutate function.

Example:

library(dplyr)
OpenPrice = c(520, 521.35, 521.45, 522.1, 522)
ClosePrice = c(521, 521.1, 522, 522.25, 522.4)
Volume = c(2000, 3500, 1750, 2050, 1300)
df = data.frame(OpenPrice, ClosePrice, Volume)
print(df)

df_new = mutate(df, cpmean_diff = ClosePrice - mean(ClosePrice, na.rm = TRUE))
print(df_new)

# If we want the new variable as a separate data frame, we can use the transmute function instead.
df_new = transmute(df, cpmean_diff = ClosePrice - mean(ClosePrice, na.rm = TRUE))
print(df_new)

set.seed function

The set.seed function helps generate the same sequence of random numbers every time the program runs. It sets the random number generator to a known state. The function takes a single argument which is an integer. One needs to use the same positive integer in order to get the same initial state.

Example:

# Initialize the random number generator to a known state and generate five random numbers
set.seed(100)
runif(5)
[1] 0.30776611 0.25767250 0.55232243 0.05638315 0.46854928

# Reinitialize to the same known state and generate the same five 'random' numbers
set.seed(100)
runif(5)
[1] 0.30776611 0.25767250 0.55232243 0.05638315 0.46854928

Next Step

We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.

Read more

R Weekly Bulletin Vol – X

This week’s R bulletin will cover topics on grouping data using ntile function, how to open files automatically, and formatting an Excel sheet using R.

We will also cover functions like the choose function, sample function, runif and rnorm function. Click To TweetHope you like this R weekly bulletin. Enjoy reading!

Shortcut Keys

1. Fold selected chunk – Alt+L
2. Unfold selected chunk – Shift+Alt+L
3. Fold all – Alt+0

Problem Solving Ideas

Grouping data using ntile function

The ntile function is part of the dplyr package, and is used for grouping data. The syntax for the function is given by:

ntile(x, n)

Where,
“x” is the vector of values and
“n” is the number of buckets/groups to divide the data into.

Example:

In this example, we first create a data frame from two vectors, one comprising of Stock symbols, and the other comprising of their respective prices. We then group the values in Price column in 2 groups, and the ranks are populated in a new column called “Ntile”. In the last line we are selecting only those values which fall in the 2nd bucket using the subset function.

library(dplyr)
Ticker = c("PAGEIND", "MRF", "BOSCHLTD", "EICHERMOT", "TIDEWATER")
Price = c(14742, 33922, 24450, 21800, 5519)

data = data.frame(Ticker, Price)

data$Ntile = ntile(data$Price, 2)
print(data)

ranked_data = subset(data, subset = (Ntile == 2))
print(ranked_data)

Automatically open the saved files

If you are saving the output returned upon executing an R script, and also want to open the file post running the code, one can you use the shell.exec function. This function opens the specified file using the application specified in the Windows file associations.

A file association associates a file with an application capable of opening that file. More commonly, a file association associates a class of files (usually determined by their filename extension, such as .txt) with a corresponding application (such as a text editor).

The example below illustrates the usage of the function.
shell.exec(filename)

Example:

df = data.frame(Symbols=c("ABAN","BPCL","IOC"),Price=c(212,579,538))
write.csv(df,"Stocks List.csv")
shell.exec("Stocks List.csv")

Quick format of the excel sheet for column width

We can format the excel sheets for column width using the command lines given below. In the example, the first line will load the excel workbook specified by the file name. In the third & the fourth line, the autoSizeColumn function adjusts the width of the columns, which are specified in the “colIndex”, for each of the worksheets. The last line will save the workbook again after making the necessary formatting changes.

Example:

wb = loadWorkbook(file_name)
sheets = getSheets(wb)
autoSizeColumn(sheets[[1]], colIndex=1:7)
autoSizeColumn(sheets[[2]], colIndex=1:5)
saveWorkbook(wb,file_name)

Functions Demystified

choose function

The choose function computes the combination nCr. The syntax for the function is given as:

choose(n,r)

where,
n is the number of elements
r is the number of subset elements

nCr = n!/(r! * (n-r)!)

Examples:

choose(5, 2)
[1] 10

choose(2, 1)
[1] 2

sample function

The sample function randomly selects n items from a given vector. The samples are selected without replacement, which means that the function will not select the same item twice. The syntax for the function is given as:

sample(vector, n)

Example: Consider a vector consisting of yearly revenue growth data for a stock. We select 5 years revenue growth at random using the sample function.

Revenue = c(12, 10.5, 11, 9, 10.75, 11.25, 12.1, 10.5, 9.5, 11.45)
sample(Revenue, 5)
[1] 11.45 12.00 9.50 12.10 10.50

Some statistical processes require sampling with replacement, in such cases you can specify replace= TRUE to the sample function.

Example:

x = c(1, 3, 5, 7)
sample(x, 7, replace = TRUE)
[1] 7 1 5 3 7 3 5

runif and rnorm functions

The runif function generates a uniform random number between 0 and 1. The argument of runif function is the number of random values to be generated.

Example:

# This will generate 7 uniform random number between 0 and 1.
runif(7)
[1] 0.6989614 0.5750565 0.6918520 0.3442109 0.5469400 0.7955652 0.5258890

# This will generate 5 uniform random number between 2 and 4.
runif(5, min = 2, max = 4)
[1] 2.899836 2.418774 2.906082 3.728974 2.720633

The rnorm function generates random numbers from normal distribution. The function rnorm stands for the Normal distribution’s random number generator. The syntax for the function is given as:

rnorm(n, mean, sd)

Example:

# generates 6 numbers from a normal distribution with a mean of 3 and standard deviation of 0.25
rnorm(6, 3, 0.25)
[1] 3.588193 3.095924 3.240684 3.061176 2.905392 2.891183

Next Step

We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.

Read more

Python Trading Strategy in Quantiacs Platform

By Milind Paradkar

Algorithmic trading has seen great traction in recent years and the numbers of students, engineering graduates, and finance professionals looking to explore this lucrative domain has been growing exponentially with each passing year.

Are you among the ones looking to learn quant skills and also make money with your trading ideas? Let us explore the Quantiacs platform which allows one to create, run and implement your Python trading strategy. Quantiacs offers great earning opportunities for successful quants.

Quantiacs Toolbox

The Quantiacs toolbox is free and open-source. Quantiacs provides up to 25 years of free data for 49 futures and S&P 500 stocks. The toolkit allows the user to create a trading strategy and backtest it with data all the way back to 1990.  In addition to futures data, Quantiacs has recently added macro-economic data which can be used in conjunction with the price time series data to improve the trading algorithms. Quantiacs supports both Python and Matlab. In this post, we will explore the Python toolbox and illustrate a toy strategy using it.

Quantiacs Python Toolbox

Quantiacs has created a simple yet powerful Python framework which can be used to create different types of algorithmic strategies. It provides for defining trading system settings like loading market data, trading costs, custom fields, capital etc. Others features of the Python toolbox include evaluating the trading system, optimization, visualization of results etc. Let us explore some features of the Python framework here.

Loading the market data:

Quantiacs trades in both stock and futures markets. Here is what the data fields look like for a stock:


Source: Quantiacs.com

We can load the stock data in Python using the quantiacsToolbox.loadData function.

As can be seen, the data is in the form of a Python dictionary. Let us check the data type of the key-value pairs.

To create a Python trading strategy we will have to manipulate the numpy array and it is required that you have a good understanding of Python numpy arrays and the myriad functions that it supports. Here’s a list of some useful functions – https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html

Candle High-Low Python Strategy

Now let us take a very simple candle high-low strategy and try to code it using the Quantiacs toolbox. The step-by step process has been illustrated below.

Step 1: Define the Settings

We test our sample strategy on Apple Inc. (AAPL) and Amazon Inc. (AMZN) stocks. The backtest period is defined in settings[‘beginInSample’] and  settings[‘endInSample’]  variables. We also define the lookback days, capital and the slippage.

Step 2: Python Trading Strategy

We have kept our strategy simple. In the first step, we define the number of candles which represent the number of the previous prices that will be considered for generating a buy/sell signal. We then compute the price difference of the last ‘n’ candles. If all the price differences are positive we go short expecting a mean reversion behavior. If all the price differences are negative we go long.

The long position is indicated by the value 1, while the short position takes value of -1.

Step 3: Run the Strategy

To execute our strategy, we use the quantiacsToolbox.runts command and specify the respective Python file.

Step 4: Visualize the results

Upon execution, the Python framework displays a very informative chart which includes the markets, an option to select the exposure type, various performance metrics etc.

As can be seen the Quantiacs Python framework is easy to use and can be used to develop varied trading strategies.

Conclusion

QuantInstiTM hosted a webinar, “Introduction to Machine Learning for Quantitative Finance” which was held on 15th June 2017 and conducted by Eric Hamer, Chief Technology Officer – Quantiacs. You can click on the link provided above to access the recorded session of the webinar.

Next Step

If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to be a successful trader. Enroll now!

Read more

R Weekly Bulletin Vol – IX

This week’s R bulletin will cover topics on how to list files, extracting file names, and creating a folder using R.

We will also cover functions like the select function, filter function, and the arrange function. Click To TweetHope you like this R weekly bulletin. Enjoy reading!

Shortcut Keys

1. Run the current chunk – Ctrl+Alt+C
2. Run the next chunk – Ctrl+Alt+N
3. Run the current function definition – Ctrl+Alt+F

Problem Solving Ideas

How to list files with a particular extension

To list files with a particular extension, one can use the pattern argument in the list.files function. For example to list csv files use the following syntax.

Example:

files = list.files(pattern = "\\.csv$")

This will list all the csv files present in the current working directory. To list files in any other folder, you need to provide the folder path.

 list.files(path = "C:/Users/MyFolder", pattern = "\\.csv$")

$ at the end means that this is end of the string. Adding \. ensures that you match only files with extension .csv

Extracting file name using gsub function

When we download stock data from google finance, the file’s name corresponds to the stock data symbol. If we want to extract the stock data symbol from the file name, we can do it using the gsub function. The function searches for a match to the pattern argument and replaces all the matches with the replacement value given in the replacement argument. The syntax for the function is given as:

 gsub(pattern, replacement, x)

where,

pattern – is a character string containing a regular expression to be matched in the given character vector.
replacement – a replacement for matched pattern.
x – is a character vector where matches are sought.

In the example given below, we extract the file name for files stored in the “Reading MFs” folder. We have downloaded the stock price data in R working directory for two companies namely, MRF and PAGEIND Ltd.

Example:

folderpath = paste(getwd(), "/Reading MFs", sep = "")
temp = list.files(folderpath, pattern = "*.csv")
print(temp)
[1] “MRF.csv”  “PAGEIND.csv”

gsub("*.csv$", "", temp)
[1] “MRF”   “PAGEIND”

Create a folder using R

One can create a folder via R with the help of the “dir.create” function. The function creates a folder with the name as specified in the last element of the path. Trailing path separators are discarded.

The syntax is given as:

dir.create(path, showWarnings = FALSE, recursive = FALSE)

Example:

dir.create("D:/RCodes", showWarnings = FALSE, recursive = FALSE)

This will create a folder called “RCodes” in the D drive.

Functions Demystified

select function

The select function comes from the dplyr package and can be used to select certain columns of a data frame which you need. Consider the data frame “df” given in the example.

Example:

library(dplyr)
Ticker = c("INFY", "TCS", "HCL", "TECHM")
OpenPrice = c(2012, 2300, 900, 520)
ClosePrice = c(2021, 2294, 910, 524)
df = data.frame(Ticker, OpenPrice, ClosePrice)
print(df)

# Suppose we wanted to select the first 2 columns only. We can use the names of the columns in the 
# second argument to select them from the main data frame.

subset_df = select(df, Ticker:OpenPrice)
print(subset_df)

# Suppose we want to omit the OpenPrice column using the select function. We can do so by using
# the negative sign along with the column name as the second argument to the function.

subset_df = select(df, -OpenPrice)
print(subset_df)

# We can also use the 'starts_with' and the 'ends_with' arguments for selecting columns from the
# data frame. The example below will select all the columns which end with the word 'Price'.

subset_df = select(df, ends_with("Price"))
print(subset_df)

filter function

The filter function comes from the dplyr package and is used to extract subsets of rows from a data frame. This function is similar to the subset function in R.

Example:

library(dplyr)
Ticker = c("INFY", "TCS", "HCL", "TECHM")
OpenPrice = c(2012, 2300, 900, 520)
ClosePrice = c(2021, 2294, 910, 524)
df = data.frame(Ticker, OpenPrice, ClosePrice)
print(df)

# Suppose we want to select stocks with closing prices above 750, we can do so using the filter 
# function in the following manner:

subset_df = filter(df, ClosePrice > 750)
print(subset_df)

# One can also use a combination of conditions as the second argument in filtering a data set.

subset_df = filter(df, ClosePrice > 750 & OpenPrice < 2000)
print(subset_df)

arrange function

The arrange function is part of the dplyr package, and is used to reorder rows of a data frame according to one of the columns. Columns can be arranged in descending order or ascending order by using the special desc() operator.

Example:

library(dplyr)
Ticker = c("INFY", "TCS", "HCL", "TECHM")
OpenPrice = c(2012, 2300, 900, 520)
ClosePrice = c(2021, 2294, 910, 524)
df = data.frame(Ticker, OpenPrice, ClosePrice)
print(df)

# Arrange in descending order

subset_df = arrange(df, desc(OpenPrice))
print(subset_df)

# Arrange in ascending order.

subset_df = arrange(df, -desc(OpenPrice))
print(subset_df)

Next Step

We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.

Read more

R Weekly Bulletin Vol – VIII

This week’s R bulletin will cover topics on plotting charts like saving the plot, adding a grid, and plotting multiple data sets in a single plot.

We will also cover functions like download.file, file.copy, file.rename, and file.remove. Click To TweetHope you like this R weekly bulletin. Enjoy reading!

Shortcut Keys

1. Run current document – Ctrl+Alt+R
2. Run from document beginning to current line – Ctrl+Alt+B
3. Run from current line to document end – Ctrl+Alt+E

Problem Solving Ideas

Saving a plot to a file

R allows you to save a plot in different file formats, such as PNG, JPEG, or PDF. The example below outlines the process for saving plots in R.

Example: We would like to generate a 2-year closing price series plot for BPCL, and then save the plot in a PNG file. We first call the png function and provide the desired filename, width, and height of the plot. We then plot the price series along with the desired arguments to the plot function. Finally, we close the graphics file using the dev.off function. The file is saved in the current working directory unless you specify some other desired path.

library(quantmod)
bpcl = getSymbols("BPCL.NS", src = "yahoo", from = "2014-01-01", to = "2016-01-01",
                  auto.assign = FALSE)

bpcl_cl = Cl(bpcl)

plot.ts(bpcl_cl, main = "BPCL Price Series", xlab = "Days", ylab = "Close Price",
        type = "l", col = "red")

# Call the png function, Plot the vectors, and finally close the graphics file
png("Saving plot.png", width = 680, height = 480)
plot.ts(bpcl_cl, main = "BPCL Price Series", xlab = "Days", ylab = "Close Price",
        type = "l", col = "red")
dev.off()

pdf
2

# Saving the plot in pdf

pdf("Saving plot.pdf")
plot.ts(bpcl_cl, main = "BPCL Price Series", xlab = "Days", ylab = "Close Price",
        type = "l", col = "red")
dev.off()

pdf
2

Adding a grid to the plot

A grid can be added to a plot using the grid function. To plot a grid, we first call the plot function with type=“n” to initialize the graphics frame without displaying the data. In the next step, we call the grid function to draw the grid. Finally, we call the lines function to draw the graphics overlaid on the grid. If we want to have point instead of a line, we can use the points function.

Example:

library(quantmod)
techm = getSymbols("TECHM.NS", src = "yahoo", from = "2015-01-01", to = "2016-01-01",
                   auto.assign = FALSE)
techm_cl = coredata(Cl(techm))
days = index(Cl(techm))
# Adding grid to a line chart

plot(days, techm_cl, main = "TECHM Price Series", xlab = "Days", ylab = "Close Price", type = "n")
grid(col = "red", lwd = 1.5)
lines(days, techm_cl)

# Adding grid to a point chart

plot(days, techm_cl, main = "TECHM Price Series", xlab = "Days", ylab = "Close Price", type = "n")
grid(col = "blue", lwd = 1.5)
points(days, techm_cl)

Plotting Multiple Datasets

To plot multiple datasets we use a high-level function like plot, followed by a low-level function like the “lines” function. Since multiple datasets can have different X-axis and Y-axis ranges, it is important to set the range for the two axes in such a way that the plot incorporates all the data points from the multiple datasets.

In the example below, we are plotting the 1-year daily closing price series for two stocks, namely PNB and CANBK. We use the range function to determine the xlim and ylim parameters. Calling the range function on the close price series of the two datasets ensures that we plot all the data points. Thereafter we call the high-level “plot” function on the closing price series of PNB stock and defined the other necessary parameters. We then use the low-level “lines” function to add the closing price series for CANBK stock.

Example:

library(quantmod)
pnb = getSymbols("PNB.NS", src = "yahoo", from = "2015-01-01", to = "2015-12-31",
                  auto.assign = FALSE)

canbk = getSymbols("CANBK.NS", src = "yahoo", from = "2015-01-01", to = "2015-12-31",
                    auto.assign = FALSE)

pnb_close = coredata(Cl(pnb))
canbk_close = coredata(Cl(canbk))
date = index(pnb)

main = "PNB-CANBK Daily Close Price Chart for 2015"
xlim = range(as.Date(date))
ylim = range(c(pnb_close, canbk_close))

plot(date, pnb_close, type = "l", lty = 1, pch = 19, col = "red", xlab = "Months",
     ylab = "Price", main = main, xlim = xlim, ylim = ylim)

# Add a line
lines(date, canbk_close, type = "l", lty = 1, pch = 18, col = "blue")

Functions Demystified

download.file

The download.file function helps download a file from a website. This could be a webpage, a csv file, an R file, etc. The syntax for the function is given as:

download.file(url, destfile)

where,
url – The Uniform Resource Locator (URL) of the file to be downloaded
destfile – the location to save the downloaded file, i.e. path with a file name

Example: In this example, the function will download the file from the path given in the “url” argument, and saved it in the D drive within the “Skills” folder with the name “betawacc.xls”.

url = "http://www.exinfm.com/excel%20files/betawacc.xls"
destfile = "D:/Skills/wacc.xls"
download.file(url, destfile)

file.copy and file.rename

The file.copy function copies a file from one folder to another folder, while the file.rename function renames an existing file in a given folder.

Example: The path of the file to be copied is mentioned as the first argument, and the location to copy is mentioned as the second argument in the function.

file.copy("C:/Users/MyFolder/TATACHEM.csv", "D:/Documents/TATACHEM.csv")

Example: The path of the file to be renamed is mentioned as the first argument, and the location of the file after renaming is mentioned as the second argument in the function. The syntax is given as:

file.rename(from=(path of the file), to=(path of the file))

file.exists and file.remove

The file.exists function is used to check whether a particular file is present in the set working directory or other folder specify as the argument to the function. The file.remove function is used to delete a particular file if it is present in the set working directory.

Example: This will check whether the file “TATACHEM.csv” exists in the “MyFolder” folder.

file.exists("C:/Users/MyFolder/TATACHEM.csv")
[1] TRUE

This will delete the “TATACHEM.csv” file from the “MyFolder” folder.

file.remove("C:/Users/MyFolder/TATACHEM.csv")

Next Step

We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.

Read more

Trading Strategy: 52-Weeks High Effect in Stocks

By Milind Paradkar

In today’s algorithmic trading having a trading edge is one of the most critical elements. It’s plain simple. If you don’t have an edge, don’t trade! Hence, as a quant, one is always on a look out for good trading ideas. One of the good resources for trading strategies that have been gaining wide popularity is the Quantpedia site. Quantpedia has thousands of financial research papers that can be utilized to create profitable trading strategies.

 

The “Screener” page on Quantpedia categorizes hundreds of trading strategies based on different parameters like Period, Instruments, Markets, Complexity, Performance, Drawdown, Volatility, Sharpe etc.

 

Quantpedia has made some of these trading strategies available for free to their users. In this article, we will explore one such trading strategy listed on their site called the “52-Weeks High Effect in Stocks”.

52-Weeks High Effect in Stocks

(http://quantpedia.com/Screener/Details/18)

The Quantpedia page for this trading strategy provides a detailed description which includes the 52-weeks high effect explanation, source research paper, other related papers, a visualization of the strategy performance and also other related trading strategies.

What is 52-Weeks High Effect? 

Let us put down the lucid explanation provided on Quantpedia here –

The “52-week high effect” states that stocks with prices close to the 52-week highs have better subsequent returns than stocks with prices far from the 52week highs. Investors use the 52-week high as an “anchor” which they value stocks against. When stock prices are near the 52-week high, investors are unwilling to bid the price all the way to the fundamental value. As a result, investors’ under-react when stock prices approach the 52-week high, and this creates the 52-week high effect.

Source Paper

(http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1787378)

The Source paper, “Industry Information and the 52-Week High Effect” has been authored by Xin Hong, Bradford D. Jordan, and Mark H. Liu.

The financial paper says that traders use the 52-week high as a reference point which they evaluate the potential impact of news against. When good news has pushed a stock’s price near or to a new 52-week high, traders are reluctant to bid the price of the stock higher even if the information warrants it. The information eventually prevails and the price moves up, resulting in a continuation. It works similarly for 52-week lows.

The trading strategy developed by the authors buys stocks in industries in which stock prices are close to 52-week highs and shorts stocks in industries in which stock prices are far from 52-week highs. They found that the industry 52-week high trading strategy is more profitable than the individual 52-week high trading strategy proposed by George and Hwang (2004).

Framing our 52-Weeks High Effect Strategy using R programming

Having understood the 52-weeks High Effect, we will try to backtest a simple trading strategy using R programming. Please note that we are not trying to replicate the exact trading strategy developed by the authors in their research paper.

We test our trading strategy for a 3-year backtest period using daily data on around 140 stocks listed on the National Stock Exchange of India Ltd. (NSE).

Brief about the strategy – The trading strategy reads the daily historical data for each stock in the list and checks if the price of the stock is near its 52-week high at the start of each month. We have shown how to check for this condition in step 4 of the trading strategy formulation process illustrated below. For all the stocks that pass this condition, we form an equal weighted portfolio for that month. We take a long position in these stocks at the start of the month and square off our position at the start of the next month. We follow this process for every month of our backtest period. Finally, we compute and chart the performance metrics of our trading strategy.

Now, let us understand the process of trading strategy formulation in a step-by-step manner. For reference, we have posted the R code snippets of relevant sections of the trading strategy under its respective step.

Step 1: First, we set the backtest period, and the upper and lower thresholds values for determining whether a stock is near its 52-week high.

# Setting the lower and upper threshold limits
lower_threshold_limit = 0.90 # (eg.0.90 = 90%)
upper_threshold_limit = 0.95 # (eg.0.95 = 95%)

# Backtesting period (Eg. 1 = 1 year) minimum period selected should be 2 years.
noDays = 4

Step 2: In this step, we read the historical stock data using the read.csv function from R. We are using the data from Google finance and it consists of the Open/High/Low/Close (OHLC) & Volume values.

# Run the program for each stock in the list
for(s in 1:length(symbol)){

print(s)

dirPath = paste(getwd(),"/4 Year Historical Data/",sep="") 
fileName = paste(dirPath,symbol[s],".csv",sep="")
data = as.data.frame(read.csv(fileName))
data$TICKER = symbol[s]

# Merge NIFTY prices with Stock data and select the Close Price
data = merge(data,data_nifty, by = "DATE") 
data = data[, c("DATE", "TICKER","CLOSE.x","CLOSE.y")] 
colnames(data) = c("DATE","TICKER","CLOSE","NIFTY")
N = nrow(data)

Step 3: Since we are using the daily data we need to determine the start date of each month. The start date need not necessarily be the 1st of every month because the 1st can be a weekend or a holiday for the stock exchange. Hence, we write an R code which will determine the first date of each month.

# Determine the date on which each month of the backtest period starts

data$First_Day = ""

day = format(ymd(data$DATE),format="%d")
monthYr = format(ymd(data$DATE),format="%Y-%m")
yt = tapply(day,monthYr, min)

first_day = as.Date(paste(row.names(yt),yt,sep="-"))
frows = match(first_day, ymd(data$DATE))
for (f in frows) {data$First_Day[f] = "First Day of the Month"}

data = data[, c("TICKER","DATE", "CLOSE","NIFTY","First_Day")]

Step 4: Check if the stock is near the 52-week high mark. In this part, we first compute the 52-week high price for each stock. We then compute the upper and the lower thresholds using the 52-week high price.

Example:

If the lower threshold = 0.90, upper threshold = 0.95 and the 52-week high = 1200, then the threshold range is given by:

Threshold range = (0.90 * 1200) – (0.95 * 1200)

Threshold range = 1080 to 1140

If the stock price at the start of the month falls in this range, we then consider the stock to be near its 52-week high mark. We have also included one additional condition in the step. This condition checks whether the stock price in the past 30 days had reached the current 52-week high price and whether it is within the threshold range now. Such a stock will not be included in our portfolio as we assume that the stock price is in decline after reaching today’s 52-week high price.

# Check if the stock is near its 52-week high at the start of the each month

data$Near_52_Week_High = "" ; data$Max_52 = numeric(nrow(data)); 
data$Max_Not = numeric(nrow(data));

frows_tp = frows[frows >= 260]
for (fr in frows_tp){
 
   # This will determine the max price in the last 1 year (252 trading days)
   data$Max_52[fr] = max(data$CLOSE[(fr-252):(fr-1)]) 
 
  # This will check whether the max price has occurred in the last "x" days.
  data$Max_Not[fr] = max(data$CLOSE[(fr-no_max):(fr-1)]) 
 
  if ((data$CLOSE[fr] >= lower_threshold_limit * data$Max_52[fr])
      & (data$CLOSE[fr] <= upper_threshold_limit * data$Max_52[fr])
      & (data$Max_Not[fr] != data$Max_52[fr]) == TRUE ){
  data$Near_52_Week_High[fr] = "Near 52-Week High"
  } else {
  data$Near_52_Week_High[fr] = "Not Near 52-Week High"
  }
 
}

Step 5: For all the stocks that fulfill the criteria mentioned in the step above, we create a long-only portfolio. The entry price equals the price at the start of the month. We square off our long position at the start of the next month. We consider the close price of the stock for our entry and exit trades.

# Enter into a long position for stocks at each start of month

data = subset(data,select=c(TICKER,DATE,CLOSE,NIFTY,First_Day,Max_52,Near_52_Week_High)
             ,subset=(First_Day=="First Day of the Month"))
data$NEXT_CLOSE = lagpad(data$CLOSE, 1)
colnames(data) = c("TICKER","DATE","CLOSE","NIFTY","First_Day","Max_52","Near_52_Week_High",
                   "NEXT_CLOSE")


data$Profit_Loss = numeric(nrow(data)); data$Nifty_change = numeric(nrow(data));

for (i in 1:length(data$CLOSE)) { 
  if ((data$Near_52_Week_High[i] == "Near 52-Week High") == TRUE){
  data$Profit_Loss[i] = round(data$CLOSE[i+1] - data$CLOSE[i],2)
  data$Nifty_change[i] = round(Delt(data$NIFTY[i],data$NIFTY[i+1])*100,2)
  } 
}

for (i in 1:length(data$CLOSE)) { 
  if ((data$Near_52_Week_High[i] == "Not Near 52-Week High") == TRUE){
  data$Profit_Loss[i] = 0
  data$Nifty_change[i] = round(Delt(data$NIFTY[i],data$NIFTY[i+1])*100,2)
  } 
 
}

Step 6: In this step, we write an R code which creates a summary sheet of all the trades for each month in the backtest period. A sample summary sheet has been shown below. It also includes the Profit/Loss from every trade undertaken during the month.

# Create a Summary worksheet for all the trades during a particular month

final_data = final_data[-1,]
final_data = subset(final_data,select=c(TICKER,DATE,CLOSE,NEXT_CLOSE,Max_52,
                                        Near_52_Week_High,Profit_Loss,Nifty_change),
                                        subset=(Near_52_Week_High == "Near 52-Week High"))

colnames(final_data) = c("Ticker","Date","Close_Price","Next_Close_Price",
                         "Max. 52-Week price","Is Stock near 52-Week high",
                         "Profit_Loss","Nifty_Change")

merged_file = paste(date_values[a],"- Summary.csv")
write.csv(final_data,merged_file)

Monthly Trades Table

Step 7: In the final step, we compute the portfolio performance over the entire backtest period and also plot the equity curve using the PerformanceAnalytics package in R. The portfolio performance is saved in a CSV file.

cum_returns = Return.cumulative(eq_ts, geometric = TRUE)
print(cum_returns)

charts.PerformanceSummary(eq_ts,geometric=TRUE, wealth.index = FALSE)
print(SharpeRatio.annualized(eq_ts, Rf = 0, scale = 12, geometric = TRUE))

A sample summary of the portfolio performance has been shown below. In this case, the input parameters to our trading strategy were as follows:

Plotting the Equity Curve

As can be observed from the equity curve, our trading strategy performed well during the initial period and then suffered drawdowns in the middle of the backtest period. The Sharpe ratio for the trading strategy comes to 0.4098.

Cumulative Return 1.172446

Annualized Sharpe Ratio (Rf=0%) 0.4098261

This was a simple trading strategy that we developed using the 52-week high effect explanation. One can tweak this trading strategy further to improve its performance and make it more robust or try it out on different markets.

Next Step

You can explore other trading strategies listed on the Quantpedia site under their screener page and if interested you can sign up to get access to hundreds of exciting trading strategies.

If you want to learn various aspects of Algorithmic trading then check out our Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ is designed to equip you with the right skill sets to be a successful trader. Enroll now!

Read more