Sentiment Analysis on News Articles using Python

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Know how to perform sentiment analysis on news articles using Python Programming Language

by Milind Paradkar

In our previous post on sentiment analysis we briefly explained sentiment analysis within the context of trading, and also provided a model code in R. The R model was applied on an earnings call conference transcript of an NSE listed company, and the output of the model was compared with the quarterly earnings numbers, and by charting the one-month stock price movement post the earnings call date. QuantInsti also conducted a webinar on “Quantitative Trading Using Sentiment Analysis” where Rajib Ranjan Borah, Director & Co-founder, iRageCapital and QuantInsti, covered important aspects of the topic in detail, and is a must watch for all enthusiast wanting to learn & apply quantitative trading strategies using sentiment analysis.

Taking these initiatives on sentiment analysis forward, in this blog post we attempt to build a Python model to perform sentiment analysis on news articles that are published on a financial markets portal. We will build a basic model to extract the polarity (positive or negative) of the news articles.

In Rajib’s Webinar, one of the slides details the sensitivity of different sectors to company and sectorial news. In the slide, the Pharma sector ranks at the top as the most sensitive sector, and in this blog we will apply our sentiment analysis model on specific news articles pertaining to select Indian Pharma companies. We will determine the polarity, and then check how the market reacted to these news. For our sample model, we have taken ten Indian Pharma companies that make the NIFTY Pharma index.

Building the Model

Now, let us dive straight in and build our model. We use the following Python libraries to build the model:

  • Requests
  • Beautiful Soup
  • Pattern

Step 1: Create a list of the news section URL of the component companies

We identify the component companies of the NIFTY Pharma index, and create a dictionary in python which contains the company names as the keys, while the dictionary values comprise the respective company abbreviation used by the financial portal site to form the news section URL. Using this dictionary we create a python list of the news section URLs for the all components companies.

Step 2: Extract the relevant news articles web-links from the company’s news section page

Using the Python list of the news section URLs, we run a Python For loop which pings the portal with every URL in our Python list. We use the requests.get function from the Python requests library (which is a simple HTTP library). The requests module allows you to send HTTP/1.1 requests. One can add headers, form data, multipart files, and parameters with simple Python dictionaries, and also access the response data in the same way.

The text of the response object is then applied to create a Beautiful Soup object. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with a given parser to provide for ways of navigating, searching, and modifying the parse tree.

HTML parsing basically means taking in the HTML code and extracting relevant information like the title of the page, paragraphs in the page, headings, links, bold text etc.

The news section webpage on the financial portal site contains 20 news articles per page. We target only the first page of the news section, and our objective is to extract the links for all the news articles that appear on the first page using the parsed HTML. We inspect the HTML, and use the find_all method in the code to search for a tag that has the CSS class name as “arial11_summ”. This enables us to extract all the 20 web-links.

Fortunes of the R&D intensive Indian Pharma sector are driven by sales in the US market and by approvals/rejections of new drugs by US Food and Drug Administration (USFDA). Hence, we will select only those news articles pertaining to the US Food and Drug Administration (USFDA) and the US market. Using keywords like “US”, “USA”, and “USFDA” in a If statement which is nested within the Python For Loop, we get us our final list of all the relevant news articles.

Step 3: Remove the duplicate news articles based on news title

It may happen that the financial portal publishes important news articles pertaining to the overall pharma sector on every pharma company’s news section webpage. Hence, it becomes necessary to weed out the duplicate news articles that appear in our Python list before we run our sentiment analysis model. We call the set function on our Python list which we generated in Step 2 to give us a list with no duplicate news articles.

Step 4: Extract the main text from the selected news articles

In this step we run a Python For Loop and for every news article URL, we call the requests.get() on the URL, and then convert the text of response object into a Beautiful Soup object. Finally, we extract the main text using the find and get_text methods from the  Beautiful Soup module.

Step 5: Pre-processing the extracted text

We will use the n-grams function from the Pattern module to pre-process our extracted text. The ngrams() function returns a list of n-grams (i.e., tuples of n successive words) from the given string. Since we are building a simple model, we use a value of one for the n argument in the n-grams function. The Pattern module contains other useful functions for pre-processing like parse, tokenize, tag etc. which can be explored to conduct an in-depth analysis.

Step 6: Compute the Sentiment analysis score using a simple dictionary approach

To compute the overall polarity of a news article we use the dictionary method. In this approach a list of positive/negative words help determine the polarity of a given text. This dictionary is created using the words that are specific to the Pharma sector. The code checks for positive/negative matching words from the dictionary with the processed text from the news article.

Step 7: Create a Python list of model output

 The final output from the model is populated in a Python list. The list contains the URL, positive score and the negative score for each of the selected news articles on which we conducted sentiment analysis.

Final Output

sentiment trading using python

Step 8: Plot NIFTY vs NIFTY Pharma returns

Shown below is a plot of NIFTY vs NIFTY Pharma for the months of October-November 2016. In our NIFTY Pharma plot we have drawn arrows highlighting some of the press releases on which we ran our sentiment analysis model. The impact of the uncertainty regarding the US Presidential election results, and the negative news for the Indian Pharma sector emanating from the US is clearly visible on NIFTY Pharma as it fell substantially from the highs made in late October’2016. Thus, our attempt to gauge the direction of the Pharma Index using the Sentiment analysis model in Python programming language is giving us accurate results (more or less).

sentiment trading using python

 

Next Step:

One can build more robust sentiment models using other approaches and trade profitably. As a next step we would recommend watching QuantInsti’s webinar on “Quantitative Trading Using Sentiment Analysis” by Rajib Ranjan Borah. Watch it by clicking on the video below:

 

Also, catch our other exciting Python trading blogs and if you are interested in knowing more about our EPAT course feel free to contact our QuantInsti team by clicking here.

Algorithmic trading course

  • Download.rar
    • Sentiment Analysis of News Article – Python Code
    • dict(1).csv
    • Nifty and Nifty Pharma(1).csv
    • Pharma vs Nifty.py

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

How to Check Data Quality using R

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

 

How to check data quality

By Milind Paradkar

Do You Use Clean Data?

Always go for clean data! Why is it that experienced traders/authors stress this point in their trading articles/books so often? As a novice trader, you might be using the freely available data from sources like Google or Yahoo finance. Do such sources provide accurate, quality data?

We decided to do a quick check and took a sample of 143 stocks listed on the National Stock Exchange of India Ltd (NSE). For these stocks, we downloaded the 1-minute intraday data for the period 1/08/2016 – 19/08/2016. The aim was to check whether Google finance captured every 1-minute bar during this period for each of the 143 stocks.

NSE’s trading session starts at 9:15 am and ends at 15:30 pm IST, thus comprising of 375 minutes. For 14 trading sessions, we should have 5250 data points for each of these stocks. We wrote a simple code in R to perform the check.

Here is our finding. Out of the 143 stocks scanned, 89 stocks had data points less than 5250, that’s more than 60% of our sample set!! The table shown below lists downs 10 such stocks from those 89 stocks.

Symbols

Let’s take the case of PAGEIND. Google finance has captured only 4348 1-minute data points for the stock, thus missing 902 points!!

Example – Missing the 1306 minute bar on 20160801:

Missing the 1306 minute bar on 20160801

Example – Missing the 1032 minute bar on 20160802:

Missing the 1032 minute bar on 20160802

If a trader is running an intraday strategy which generates buy/sell signals based on 1-minute bars, the strategy is bound to give some false signals.

As can be seen from the quick check above, data quality from free sources or from cheap data vendors is not always guaranteed. Many of the cheap data vendors source the data from Yahoo finance and provide it to their clients. Poor data feed is a big issue faced by many traders and you will find many traders complaining about the same on various trading forums.

Backtesting a trading strategy using such data will give false results. If are using the data in live trading and in case there is a server problem with Google or Yahoo finance, it will lead to a delay in the data feed. As a trader, you don’t want to be in a position where you have an open trade, and the data feed stops or is delayed. When trading with real money, one is always advised to use quality data from reliable data vendors. After all, Data is Everything!

Next Step

If you’re a retail trader interested in learning various aspects of Algorithmic trading, check out the Executive Programme in Algorithmic Trading (EPAT). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. The course equips you with the required skillsets to be a successful trader.

Download Data Files

  • Do You Use Clean Data.rar
    • 15 Day Intraday Historical Data.zip
    • F&O Stock List.csv
    • R code – Google_Data_Quality_Check.txt
    • R code – Stock price data.txt

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Sentiment Analysis in Trading Using R [WORKING MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Sentiment Analysis in Trading

In this post we discuss sentiment analysis in brief and then present a basic sentiment analysis model in R. Sentiment analysis is the analysis of the feelings (i.e. attitudes, emotions and opinions) which are expressed in the news reports/blog posts/twitter messages etc., using natural language processing tools.

Natural language processing (NLP) in simple terms refers to the use of computers to process sentences/text in a natural language such as English. The objective here is to extract information from unstructured or semi-structured data found in these tweets/blogs/articles. To enable this NLP makes use of artificial intelligence, computational linguistics, and computer science.

Using NLP models hundreds of text documents can be processed to ascertain the sentiment in seconds. These days sentiment analysis is a hot topic, and has found wide application in areas like Business intelligence, Politics, Finance, Policy making etc.

Sentiment analysis in Trading – Sentiments can often drive the direction of the markets. Hence, traders and other participants in the financial markets seek to gauge the sentiment expressed in news reports/tweets/blog posts. Traders build automatic trading systems which extract the sentiment from natural language. These trading systems take long/short positions in the markets based on the trading signals generated. The trading systems can also be combined with other trading systems. The objective at the end of the day is to generate superior returns from the extracted information.

There are various methods and models for sentimental analysis. Let us take a look at a very basic model in R for sentimental analysis.

Sentiment analysis model in R

In this model we implement the “Bag-of-words” approach to sentiment analysis. The process identifies positive and negative words (or a string of words) within an article. For this it makes use of a large dictionary which contains words that carry sentiment.  Each word in this dictionary can be assigned a weight. The sum of the positive and negative words is the final sentiment score generated by the model.

We will test our model on the management commentary text taken from the latest earnings call transcript of Eicher Motors Ltd. Eicher Motors is a leading Indian automaker company which owns the Royal Enfield Motors. The objective of our model will be to gauge the opinion expressed in their fourth quarter 2015 earnings call.          Basic sentiment analysis model

To build this model we are using the “tm” and the “Rweka” package in R. We load the libraries and then read the two documents which contain the positive and the negative terms. To prepare these documents we have gone through four previous conference call transcripts prior to the fourth quarter 2015. We picked the positive/negative words from these transcripts to populate our dictionary. In addition to these words we have also added some general positive/negative words that relate to the Motorcycle industry.

load the document

We will be considering only the management’s commentary in our sentiment analysis model. We load the text document (fourth quarter 2015) containing the CEO’s prepared text commentary in R using the Corpus function. For this we have stored the commentary document in the TextMining folder in the R’s working directory.

Clean the text

Next step is to clean the text. We convert all words to lowercase, remove punctuations, remove numbers, and strip the whitespace. The writeLines function enables us to see the text post the cleansing.

In the code below, we tokenize the text which was cleaned above. Tokenization is the process of breaking a stream of text into words or a string of words. We are using the NGramTokenizer function here. This creates N-grams of text.

N-grams are basically a set of co-occuring words within a given text. For example, consider this sentence “The food is delicious”. If n= 2, then the ngrams would be:

  • the food
  • food is
  • is delicious

Thereafter we create a term document matrix (called “terms” in the code) which a matrix that lists all occurrences of words in the corpus.

write the tokenizer

Below we check if the positive/negative words in the dictionary are present in the text document.

check for positive negative words

Now we extract all the positive/negative words from the text document which matched with the words in our dictionary.

Extract positive negative words

The code lines below compute the positive/negative score, and finally the sentiment score.

Calculate score

Final result – Sentiment score

Sentiment score

The model found 14 positive words and 4 negative words, and the final sentiment score was 10. This tells us that the quarterly result for Q4 2015 was good from the management’s perspective. The word cloud below shows some of the positive/negative words that were picked from the text document on which we ran the model.

Sentiment analysis

Validate our sentiment analysis model – let us check the quarterly performance numbers to confirm the positive sentiment score generated by our model. As can be seen, Eicher Motors posted a strong quarter. EBIT growth was around 72% y/y on a strong sales volume of 125,690 motorcycles. The strong results were despite the production shutdown for few days which was caused by the floods experienced during the quarter at its production facility.

The chart on the right shows the stock market’s reaction to Eicher Motors strong results on the day of earnings result announcement. The stock opened at around 17100 levels, made a big move touching an intraday high of around Rs.18500, and finally closed at Rs. 18,175.

Eicher Motors Consolidated Results

Eicher Motors Chart

Conclusion

This was a basic introduction to sentiment analysis. The model above can be made more robust and fine-tuned further. In future posts we will try to cover other sentiment analysis approaches and attempt to build a model around them.

QuantInsti has been actively participating in conferences on sentiment analysis, and was one of the lead marketing and education partner at the recently held “Sentiment analysis in Finance” conference in Singapore, 2016. Rajib Ranjan Borah, Co-founder & Director of iRageCapital Advisory Pvt. Ltd, & QuantInsti was one of the esteemed panelists for the session, “New Paradigms for Sentiment Analysis Applied to Finance” at the conference.

To know more about QuantInsti and the Executive Programme in Algorithmic Trading (EPAT) course offered by QuantInsti, check our website and the EPAT course page. Feel free to contact our team at contact@quantinsti.com for queries on EPAT.

Download Data File

  • Sentiment analysis in Trading – Files.rar
    • Eicher Motors Sentiment Analysis – R Code.txt
    • Negative terms.csv
    • Positive Terms.csv
    • Q4.txt

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Machine Learning and Its Application in Forex Markets – Part 2 [WORKING MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Machine Learning and Its Application in Forex Markets - Part 2

In our previous post on Machine learning we derived rules for a forex strategy using the SVM algorithm in R. In this post we take a step further, and demonstrate how to backtest our findings.

To recap the last post, we used Parabolic SAR and MACD histogram as our indicators for machine learning.  Parabolic SAR indicator trails price as the trend extends over time. SAR is below prices when prices are rising and above prices when prices are falling. SAR stops and reverses when the price trend reverses and breaks above or below it.

The MACD oscillator comprises of the MACD line, Signal line and the MACD histogram. The MACD Line is the 12-day Exponential Moving Average (EMA) less the 26-day EMA. MACD Signal line is a 9-day EMA of the MACD line. The MACD Histogram represents the difference between MACD line and the MACD Signal line. The histogram is positive when the MACD Line is above its Signal line and negative when the MACD Line is below its Signal line.

The EUR/USD price series chart below shows Parabolic SAR plotted in blue, and the MACD line, MACD signal line, and the MACD histogram below the EURUSD price series.

EURUSD

Our intention is to take positions around the MACD line crossovers and Parabolic SAR reversal points. When the Parabolic SAR gives buy signal and MACD lines crosses upwards, we buy. When Parabolic SAR gives sell signal and MACD lines crosses downwards, we sell.

SVM Predictions

After selecting the indicators we ran the SVM algorithm on EUR/USD data, which gave us the plot as shown above. Looking at the SVM predictions, we now frame the rules, and backtest them to see the performance of our strategy.

Short rule = (Price – SAR) < 0.0010 & MACD histogram < 0.0010
Long rule = (Price – SAR) > -0.0050 & MACD histogram > -0.0010

We have used Michael Kapler’s Systematic Investor Toolbox to backtest our model in R. We start by loading the toolbox and the necessary libraries.

install libraries

Next we create a new environment and load the historical EUR/USD data using the getSymbols function.

create a new environment

We will check the performance of our rule-based model against a simple ‘buy and hold’ model.  To do that, we first create a ‘buy and hold’ model.

specify the prices and store our models

Our next step is to compute the indicators for our rule-based model.

calculate selected indicators

We run two models here, ‘long short’ model, and another ‘long short’ model using stop loss and take profit. First we create a long short model without stop loss and take profit.

set conditions

Next we set the takeprofit and stop loss levels, and create a long short model using these levels. We call this model as ‘stop.loss.take.profit’ model.

set take profit

Let us now run all the three models, and check their relative performance.

view equity curve

Cumulative Performance Performance table

As you can be seen, the rule-based strategy has a smooth equity curve, and is giving a better CAGR of 5.97 than the simple ‘buy hold’ model CAGR of 1.18. The Maximum drawdown of our strategy is at 13.92 compared to the ‘buy hold’ strategy drawdown of 30.11. You can play with the indicator settings or change the short-long rules or the stop loss-take profit levels to refine the model further.

Once you understand Machine learning algorithms, these can be a great tool for formulating profit-making strategies. To learn more on Machine Learning you can watch our latest webinar, “Machine Learning in Trading”, which was hosted by QuantInsti, and conducted by our guest speaker Tad Slaff, CEO/Co-founder Inovance.

Machine learning is covered in the Executive Programme in Algorithmic Trading (EPAT) course conducted by QuantInsti. To know more about EPAT check the EPAT course page or feel free to contact our team at contact@quantinsti.com for queries on EPAT.

Download R Code

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Machine Learning and Its Application in Forex Markets [WORKING MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Machine Learning and Its Application in Forex Markets

In the last post we covered Machine learning (ML) concept in brief. In this post we explain some more ML terms, and then frame rules for a forex strategy using the SVM algorithm in R.

To use ML in trading, we start with historical data (stock price/forex data) and add indicators to build a model in R/Python/Java. We then select the right Machine learning algorithm to make the predictions.

First, let’s look at some of the terms related to ML.

Machine Learning algorithms – There are many ML algorithms (list of algorithms) designed to learn and make predictions on the data. ML algorithms can be either used to predict a category (tackle classification problem) or to predict the direction and magnitude (tackle regression problem).

Examples:

  • Predict the price of a stock in 3 months from now, on the basis of company’s past quarterly results.
  • Predict whether Fed will hike its benchmark interest rate.

Indicators/Features – Indicators can include Technical indicators (EMA, BBANDS, MACD, etc.), Fundamental indicators, or/and Macroeconomic indicators.

Example 1 – RSI(14), Price – SMA(50) , and CCI(30). We can use these three indicators, to build our model, and then use an appropriate ML algorithm to predict future values.

Example 2 – RSI(14), RSI(5), RSI(10), Price – SMA(50), Price – SMA(10), CCI(30), CCI(15), CCI(5)

In this example we have selected 8 indicators. Some of these indicators may be irrelevant for our model. In order to select the right subset of indicators we make use of feature selection techniques.

Feature selection – It is the process of selecting a subset of relevant features for use in the model. Feature selection techniques are put into 3 broad categories: Filter methods, Wrapper based methods and embedded methods. To select the right subset we basically make use of a ML algorithm in some combination. The selected features are known as predictors in machine learning.

 

Feature Selection

Support Vector Machine (SVM) – SVM is a well-known algorithm for supervised Machine Learning, and is used to solve both for classification and regression problem.

A SVM algorithm works on the given labeled data points, and separates them via a boundary or a Hyperplane. SVM tries to maximize the margin around the separating hyperplane. Support vectors are the data points that lie closest to the decision surface.

Support vectors

Framing rules for a forex strategy using SVM in R – Given our understanding of features and SVM, let us start with the code in R. We have selected the EUR/USD currency pair with a 1 hour time frame dating back to 2010. Indicators used here are MACD (12, 26, 9), and Parabolic SAR with default settings of (0.02, 0.2).

First, we load the necessary libraries in R, and then read the EUR/USD data. We then compute MACD and Parabolic SAR using their respective functions available in the “TTR” package. To compute the trend, we subtract the closing EUR/USD price from the SAR value for each data point. We lag the indicator values to avoid look-ahead bias. We also create an Up/down class based on the price change.

Frame rules

Thereafter we merge the indicators and the class into one data frame called model data. The model data is then divided into training, and test data.

Merging the indicators amd creating training

We then use the SVM function from the “e1071” package and train the data. We make predictions using the predict function and also plot the pattern. We are getting an accuracy of 53% here.

Use SVM to find pattern

From the plot we see two distinct areas, an upper larger area in red where the algorithm made short predictions, and the lower smaller area in blue where it went long.

SVM Predictions

SAR indicator trails price as the trend extends over time. SAR is below prices when prices are rising and above prices when prices are falling. SAR stops and reverses when the price trend reverses and breaks above or below it. We are interested in the crossover of Price and SAR, and hence are taking trend measure as the difference between price and SAR in the code. Similarly, we are using the MACD Histogram values, which is the difference between the MACD Line and Signal Line values.

Looking at the plot we frame our two rules and test these over the test data.
Short rule = (Price–SAR) > -0.0025 & (Price – SAR) < 0.0100 & MACD > -0.0010 & MACD < 0.0010
Long rule = (Price–SAR) > -0.0150 & (Price – SAR) < -0.0050 & MACD > -0.0005

Define rules for trades

We are getting 54% accuracy for our short trades and an accuracy of 50% for our long trades. The SVM algorithm seems to be doing a good job here. We stop at this point, and in our next post on Machine learning we will see how framed rules like the ones devised above can be coded and backtested to check the viability of a trading strategy.

To learn more on Machine Learning you can watch the latest webinar, “Machine Learning in Trading”, which was hosted by QuantInsti, and conducted by our guest speaker Tad Slaff, CEO/Co-founder Inovance.

Machine learning is covered in the Executive Programme in Algorithmic Trading (EPAT) course conducted by QuantInsti. To know more about EPAT check the EPAT course page or feel free to contact our team at contact@quantinsti.com for queries on EPAT.

Download R Code

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Trading with ETF as a Lead Indicator [EXCEL MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Trading with ETF

Index tracking trading model

Index tracking trading is a strategy where you observe price on the previous ‘n’ candlesticks and make your bets accordingly. The intuition is that MSCI FUTURES follows the ETF. Hence if ETF is performing well we assume MSCI FUTURES perform well too thus making buying and selling decision accordingly.

Who can use it?

People interested in algorithmic trading and those who want to learn about ETF as a lead indicator.

How it helps?

This excel model will help you to:

  • Build a strategy with ETF as a lead indicator
  • Understand the trading logic of strategy implementation
  • Optimize trading parameters

Why should you download the trading model?

As the trading logic is coded in the cells of the sheet, you can better the understanding by downloading and analyzing the files at your own convenience. Not just that, you can play around the numbers to obtain better results. You might find suitable parameters that provide higher profits than specified in the article.

Explanation of the model

In this example we consider the MSCI FUTURES data. We track an ETF and assume that the MSCI has a strong positive beta with the ETF. We observe the 5 minute intervals prices of ETF and MSCI and buy/sell MSCI based on the ETF returns. If the ETF returns are positive we buy one lot of MSCI FUTURES. If the ETF returns are negative we sell one lot of MSCI FUTURES. The ETF we track is Indian SP Equity. In essence we go long (buy) on MSCI futures if the ETF is bullish and go short (sell) on MSCI FUTURES if the ETF is bearish.

The data used for MSCI FUTURES contract is the data separated by 5 minute interval from 2nd Feb 2015 to 4th of March 2015.

 

Assumptions

  1. For simplification purpose, we ignore bid ask spreads.
  2. Prices are available at 5 minutes interval and we trade at the 5 minute closing price only.
  3. Since this is discrete data, squaring off of the position happens at the end of the candle i.e. at the price available at the end of 5 minutes.
  4. Only the regular session (T) is traded
  5. Transaction cost is $1.10 for MSCI FUTURE.
  6. Margin for each trade is $1500.
  7. Trading quantity is 1 lot (MSCI order size 50) and trading hours are 11:30 a.m. to 5:55 p.m. SGT.

Input parameters

Please note that all the values for the input parameters mentioned below are configurable.

  1. Price at the end of 5 minute interval is considered.
  2. We use ETF as a lead indicator

The market data and trading model are included in the spread sheet from the 12th row onwards. So when the reference is made to column D, it should be obvious that the reference commences from D12 onwards.

Explanation of the columns in the Excel Model

Column C represents the price for ETF.

Column D represents the price for MSCI FUTURES.

Column E represents log returns of ETF data.

Column F represents log returns of MSCI data.

Column G represents average returns of ETF data.

Column H represents average returns of MSCI data.

Column I calculates the trading signal. The formula =IF(G13=””, “”, IF(G13>H13, “Buy”, IF(G13<H13, “Sell”, “I12”))) means if the entry in cell G13 is blank then keep I13 blank otherwise if G13 (MSCI FUTURES data) is greater than H13 then buy signal for the MSCI FUTURES contract is generated else if G13 is lower than H13 then sell signal for the MSCI FUTURES contract is generated.

Column J represents trade price. This is the price at which the trading signal is generated. The formula =IF(I13=””, “”, IF(I13=I12, J12, D13)) means if the entry in cell I13 is blank then the entry is blank. Otherwise if I13=I12, then the trade price is given by the entry in J12. If I13 is neither blank nor equal to I12 then the trade price is given by D13.

Column K represents the Mark To Market. The formula =IF(OR(I13=””, I12=””), 0, IF(I13=I12, K12, IF(I13=”Buy”, K12+J12-J13, IF(I13=”Sell”, K12+J13-J12, 0)))) means if the cells I12 or I13 are blank then the MTM is zero. Otherwise if I13=I12 then MTM is K12 else if I13 is not equal to I12 and if I13 = “Buy” the MTM is given by K12+J12-J13 if I13 is not equal to I12 and if I13=”Sell” then MTM is given by K12+J13-J12.

Column L represents the profit/loss status of the trade. The formula =IF(OR(I13=””, I12=””), “”, IF(K13<K12, “Loss”, IF(K13>K12, “Profit”, IF(I13<>I12, “NPNL”, “”)))) means if either I13 or I12 is blank then the profit entry is also blank. Otherwise if K13 is less than K12 the entry is loss, if K13 is greater than K12 the entry is profit. The next part I13<>I12 means if I13 is not equal to I12 then it’s No Profit No Loss (NPNL). This makes sense since we have already calculated profit from previous squared off position.

Column M calculates the trade profit/loss. The formula =IF(L22=””, 0,K22-K21 ) means if L23 is blank meaning no profit no loss then the trade profit/loss is zero. If L23 is not blank then the profit is calculated as the difference in MTM values.

Outputs

The output table has some performance metrics tabulated.

Number of profitable trades is 118 and the number of loss making trades is 77.

Total trades are 277 and the total profit is $1970. Average profit per trade is $7.24. Net profit per trade is $5.04. This is calculated as average profit per trade minus twice the transaction costs. Number of trading intervals is 23. Monthly returns calculated as the product of total trades and net profit per trade divided by the product of margin for each trade and the number of days interpolation

Now it is your turn!

  • First, download the model
  • Modify the parameters and study the backtesting results
  • Run the model for other historical prices
  • Modify the formula and strategy to add new parameters and indicators! Play with logic! Explore and study!
  • Comment below with your results and suggestions

Click to download the excel (If you have not logged in yet, first login/sign up!)

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Momentum Based Strategies for Low and High Frequency Trading – [EXCEL MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Momentum-Trading-Strategies-Webinar-Download

On 3rd December 2015, QuantInsti held a comprehensive webinar session on Momentum Trading Strategies, where Mr. Nitesh Khandelwal, Co-founder, iRage Capital, discussed regarding momentum trading in Low and High frequency trading.

This webinar focused on the various aspects of Momentum Trading Strategies for both Conventional/Low Frequency as well as High Frequency (HFT). Some popular strategies in momentum based trading were also dug deeper into to select niche momentum trading strategies. The webinar aimed to evaluate how HFT momentum strategies differ from conventional momentum strategies both from logic and deployment perspective.

Following points were discussed in detail

  • Popular Momentum Trading Strategies
  • Momentum Trading in HFT
  • Risks in Momentum Trading
  • Sample Model

Now it is your turn!

  • First, download the model
  • Modify the parameters and study the backtesting results
  • Run the model for other historical prices
  • Modify the formula and strategy to add new parameters and indicators! Play with logic! Explore and study!
  • Comment below with your results and suggestions

Download Sample Model of a Momentum Trading Strategy

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Candlestick Trading – A Momentum Strategy with Example [EXCEL MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Excel Model

Candle stick trading is a momentum strategy where you observe price on the previous ‘n’ candlesticks and make your bets accordingly. The intuition is if the price is increasing continuously for, say, 3 candle sticks then it is highly probable that it will rise further.

Who can use it?

People interested in algorithmic trading, those who want to learn about momentum trading.

How it helps?

This excel model will help you in,

  • Learn how momentum strategy is implemented
  • Understand the trading logic of strategy implementation
  • Optimize trading parameters
  • Understand intraday returns of momentum trading

Candlestick Trading Example

In this example we consider the INR FUTURES data on SGX (Singapore exchange). We implement momentum strategy on this contract. This momentum strategy is simply based on the fact that a rising market will be followed by a rising market and a falling market will be followed by a falling market. We hope to ride on the tide and make some profit before the momentum vanishes. The data used for INR FUTURES contract is the data separated by 5 minute interval for 3rd Feb 2015 to 19th March 2015. The trading strategy is intuitional and unlike mean reversion for pairs trading there is no hypothesis as such explaining why this strategy would work or not. We would like to benefit from the market wave and optimize our bet by specifying stop loss and take profit limits. This model is flexible and can be varied to achieve different limits to exit the trade depending upon the trader’s risk appetite.

INR Graph

Assumptions

  1. For simplification purpose, we ignore bid ask spreads.
  2. Prices are available at 5 minutes interval and we trade at the 5 minute closing price only.
  3. Since this is discrete data, squaring off of the position happens at the end of the candle i.e. at the price available at the end of 5 minutes.
  4. Only the regular session (T) is traded
  5. Transaction cost is $0.35 for INR FUTURE.
  6. Margin for each trade is $800.
  7. Trading quantity is 1 lot and trading hours are 7:40 a.m. to 2:00 a.m. SGT.

Input parameters

Please note that all the values for the input parameters mentioned below are configurable.

  • High/Low of 3 candles (one candle=every 5 minute price) is considered.
  • A stop loss of 0.08 and profit limit of 0.16 is set.
  • The order size for trading INR FUTURE is INR 2000000.

The market data and trading model are included in the spread sheet from the 12th row onwards. So when the reference is made to column D, it should be obvious that the reference commences from D12 onwards.

Column C represents the price for INR FUTURE.

Column D represents 3 candle high meaning the highest price of the previous 3 candles.

Column E represents 3 candle low meaning the lowest price of the previous 3 candles.

Column F calculates the trading signal. The formula =IF(D13=””, “”, IF(C13>D13, “Buy”, IF(C13<E13, “Sell”, “”))) means if the entry in cell D13 is blank then keep F13 blank otherwise if C13 (INR FUTURES data) is greater than D13 (3 candle high) then buy signal for the INR FUTURES contract is generated else if C13 is lower than E13 (3 candle low) then sell signal for the INR FUTURES contract is generated.

Column G represents entry price. This is the price at which the trading signal is generated. The formula =IF(H13=H12, G12, IF(OR(H13=”Buy”, H13=”Sell”), C13, “”)) means if the entry in cell H13 is same as H12 then the value in G13 should be the value in G12 otherwise if H13 is either “Buy” or “Sell” then the entry in G13 is the value in C13 (INR FUTURES price) else if H13 is neither “Buy” nor “Sell” leave it blank.

Column H represents the status of the trade. Given our assumptions and input parameters there are four status that can occur, “Buy”, “Sell”, “TP (Take Profit)” and “SL (Stop Loss)”.

The formula =IF(OR(H17=””, H17=”TP”, H17=”SL”), F18, IF(H17=”Buy”, IF(C18<G17+$C$4, “SL”, IF(C18>G17+$C$5, “TP”, H17)), IF(H17=”Sell”, IF(C18>G17-$C$4, “SL”, IF(C18<G17-$C$5, “TP”, H17)), “”))) can be simplified as follows:-

If the entry in H17 is either blank or TP or SL then choose the value in F18 (F column has either Buy or Sell or blank values). Otherwise look into the next If condition.

If the entry in H17 is “Buy”, meaning we have a buy position, and if the price of the contract goes below the stop loss limit then we exit the position at stop loss and if the price of the contract goes above the take profit limit then we exit the position at take profit. Similarly, if the position is “Sell” and the contract price rises above the selling price beyond the stop loss limit then exit the position at stop loss and if the contract price falls below the selling price beyond the take profit limit then exit the position by taking the profit.

Column I represents the profit/loss status of the trade. P/L is calculated only when we have squared off our position. The formula =IF(OR(H13=”SL”, H13=”TP”), IF(H12=”Buy”, C13-G12, IF(H12=”Sell”, G12-C13, 0)), 0) can be summarized as follows:-

The first if condition states that proceed to the next if condition only if the corresponding status in column H is either “SL” or “TP” else the entry in the cell is zero.

The next set of if conditions calculate profit assuming either stop loss or take profit has been achieved. If the status in column H is “Buy”, then the profit/loss is calculated as C13-G12. Remember that the column G has the price at which you traded (in this case “Buy”) and the column C has the market data for INR FUTURES contract. Hence the profit/loss is simply the difference between the price at which you sold minus the price at which you bought. If the status in column H is “Sell”, then the profit/loss is calculated as G12-C13 simply meaning the difference between the price at which you sold ( shorted) and the price at which you bought later thus squaring off the position.

Column J calculates the cumulative profit.

Outputs

The output table has some performance metrics tabulated. Loss from all loss making trades is $1588 and profit from trades that hit TP is $1988. So the total P/L is $1988-$1588=$400. Loss trades are the trades that resulted in losing money on the trading positions. Profitable trades are the successful trades ending in gaining cause. Average profit is the ratio of total profit to the total number of trades. Net average profit is calculated after subtracting the transaction costs which amounts to $3.03.

Download Excel Model

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

How to Use Black Scholes Option Pricing Model [EXCEL MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Excel Model

In this post, we will discuss on modelling option pricing using Black Scholes Option Pricing model and plotting the same for a combination of various options. You can put any number of call and/or put options in the model and use built in macro (named ‘BS’) for calculating the BS model based option pricing for each option. The macro (named ‘PayOff’) is used for plotting the Profit/Loss for the overall combination of the option positions against the spot price.

Sheet1 named Payoff has a table where we specify all option parameters. Column B specifies Expiry data for the options. Column C specifies the option type. Column D has the strike price of the underlying asset. Column E shows the premium amount in INR at which the option is bought. Column F tells us about the number of option contracts we have bought. Coumn G specifies the volatility, column H specifies Black Shcoles price of the option (calculated by the macro “BS”. Column I is the current spot price of the underlying asset, column J shows the time to expiry of the option (calculated using the formula). Column K specifies the Expected PnL of the option (calculated using the formula). It is calculated as the difference between the Black Scholes price and the premium paid multiplied by the number of option contracts. Column L shows the actual premium in the market currently, meaning the current premium should you wish to buy the option.

The 13th row calculates the total investment. Since we have bought two call and put options at a premium of 120 and 152 our total investment is 120*2+152*2=544. The 14th row shows the Expected present value. Since the market has moved after the options are bought, the current expected price of the option multiplied by the number of option contracs gives the expected value. Hence the expected payoff is 170.18*2+124.59*2=589.5475.

The present value in row 15 is calculated similarly by taking the product of actual premium in the market currently and number of options contract. Hence the present value is 150*2+120*2=540.

The graph below shows the plot of expected payoff for the option portfolio. This is done by taking the expected payoff values from sheet4. More on this later.

Options Trading Screenshot

BS Price sheet shows the pricing of an option using Black Scholes model. From Black-Scholes option pricing model, we know the price of call option on a non-dividend stock can be written as:

$$C_t = S_t N(d_1) – Xe^{-r\tau} N(d_2)$$

and the price of put option on a non-dividend stock can be written as:

$$P_t = Xe^{-r\tau} N (-d_2) – S_tN (-d_1)$$

where

$$d_1 = \frac {{ln ( \frac {S_t} {X}) + (r + \frac {\sigma_s^2} {2}) \tau}} {{\sigma_s} {\sqrt{\tau}}}$$

$$d_1 = \frac {{ln ( \frac {S_t} {X}) + (r + \frac {\sigma_s^2} {2}) \tau}} {{\sigma_s} {\sqrt{\tau}}} = d_1 – \sigma_s \sqrt{\tau}$$

$$\tau = T – t$$

is the cumulative density function of normal distribution.

$$S$$

Current price of the underlying

$$X$$

Strike price

$$r$$

Risk free interest rate

$$\tau$$

Time to expiry

$$ln$$

Natural log

The call and put value using Black Scholes framework is calculated in the 13th and 14th row for the parameters specified in row 1 to 5.

Options Trading Screenshot

“Back-end BS” sheet has the same set of values of Payoff sheet from columns A to G. Column H onwards shows the spot price ranges in the 2nd row. You can change the starting point for the price range of Spot Price in Cell H2. The increment (presently of 10 points) can be changed from Cell I2 and then drag it across the range horizontally. The 3rd row shows the Black Scholes call option for the specified parameters and varying spot price. The 4th row shows the Black Scholes put option for the specified parameters and varying spot price. Please note that though the post shows the calculation for three options, you can go upto for a combination of 10 options by just filling appropriate values in the table in Sheet1. For more than 10 options, you can edit the sheet and the macro.

The 13th row calculates the total payoff from the option position. This is calculated as the difference between the profits from options and the total investment.

Options Trading Screenshot

In this case the profit from overall option position is the sum of H3 and H4. The total investment (calculated in Payoff Sheet 13th row) of 544 has to be subtracted from the sum of H3 and H4 to obtain the final payoff. Similar calculations are done to all other columns henceforth.

Options Trading Screenshot

The Expected Payoff graph in Sheet1 is the plot of total payoff calculated in Sheet3 against the underlying spot price.

Options Trading Screenshot

There are two macros. One in BS Price sheet that calculates Black Scholes option price depending upon the values entered in the Payoff sheet. The other one is in the Payoff sheet that plots the Expected Payoff graph. Please make a note that the Expiry Date in Payoff sheet is set beyond the current date, else the Black Scholes price will not return a numerical value for a negative time period.

Download Excel Model

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Pair Trading Strategy [EXCEL MODEL]

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Pair Trading Strategy

Pair Trading

Pair trading is a trading strategy that matches a long position in one stock/asset with an offsetting position in another stock/asset that is statistically related. Pair trading is a mean reversion strategy where we bet that the prices will revert to their historical trends.

Who can use this Excel Model?

People interested in algorithmic and Quant trading, those who want to learn about statistical arbitrage.

How it helps?

This excel model will help you to:

  • Learn the application of mean reversion
  • Understand of pair trading
  • Optimize trading parameters
  • Understand significant returns of statistical arbitrage

Why should you download the trading model?

As the trading logic is coded in the cells of the sheet, you can better the understanding by downloading and analyzing the files at your own convenience. Not just that, you can play around the numbers to obtain better results. You might find suitable parameters that provide higher profits than specified in the article.

Explanation of the model

In this example we consider the MSCI and Nifty pair as both of them are stock market indexes. We implement mean reversion strategy on this pair. Mean reversion is a property of stationary time series. Since we claim that the pair we have chosen is mean reverting we should test whether it follows stationarity. The following diagram shows the plotting of logarithmic ratio of Nifty to MSCI. At the outset this appears to be mean reverting with a mean value of 2.088 but we use Dicky Fuller Test to test whether it is stationary with a statistical significance. The results under Cointegration output table shows that the price series is stationary and hence mean reverting. Dicky Fuller Test statistic and a significantly low p-value (<0.05) confirms our assumption. Having determined that the mean reversion holds true for the chosen pair we proceed with specifying assumptions and input parameters.

Assumptions

  1. For simplification purpose, we ignore bid ask spreads.
  2. Prices are available at 5 minutes interval and we trade at the 5 minute closing price only.
  3. Since this is discrete data, squaring off of the position happens at the end of the candle i.e. at the price available at the end of 5 minutes.
  4. Only the regular session (T) is traded
  5. Transaction costs are $0.375 for Nifty and $1.10 for MSCI.
  6. Margin for each trade is $990 (approximated to $1000).

Input parameters

Please note that all the values for the input parameters mentioned below are configurable.

  • Average of 10 candles (one candle=every 5 minute price) is considered.
  • A “z” score of +2 is considered for buy and -2 for selling.
  • A stop loss of $100 and profit limit of $200 is set.
  • The order size for trading MSCI is 50 (1 lot) and for Nifty is 6 (3 lots).

The market data and trading parameters are included in the spread sheet from the 12th row onwards. So when the reference is made to column D, it should be obvious that the reference commences from D12 onwards.

Explanation of the columns in the Excel Model

Column C represents the price for MSCI.

Column D represents Nifty price.

Column E is the logarithmic ratio of Nifty to MSCI.

Column F calculates 10 candle average. Since 10 values are needed for average calculations, there are no values from F12 to F22. The formula =IF(A23>$C$3, AVERAGE(INDEX($E$13:$E$1358, A23-$C$3):E22), “”) means that the average should be calculated only if the data sample available is more than 10 (i.e. the value specified in cell C3), otherwise the cell should be blank. Consider cell F22. Its corresponding cell A22 has a value of 10. Since A22>$C$3 fails, the entry in that cell is blank. The next cell F23 has a value since A23>$C$3 is true. The next bit of the formula

AVERAGE(INDEX($E$13:$E$1358, A23-$C$3):E22) calculates the average value of last 10 (as mentioned in cell C3) candles of column E data. Similar logic holds for column G where standard deviation is calculated. The “z” score is calculated in the column H. Formula for calculating “z” score is z= (x-)/(σ). Here x is the sample (Column E), is the mean value (Column F) and σ is the standard deviation (Column G).

Column I represents trading signal. As mentioned in the input parameters, if “z” score goes below -2 we buy and if it goes above +2 we sell. When we say buy, we have a long position in 3 lots of Nifty and have short position in 1 lot of MSCI. Similarly when we say sell, we have long position in 1 lot of MSCI and have short position in 3 lots of Nifty thus squaring off the position. We have one open position all the time. To understand what this means, consider two trading signals “buy” and “sell”. For the “buy” signal, as explained before, we buy 3 lots of Nifty future and short 1 lot of MSCI future. Once the position is taken, we track the position using the Status column, i.e. column M. In each new row while the position is continuing, we check whether the stop loss (as mentioned in cell C6) or take profit (as mentioned in cell C7) is hit. The stop loss is given the value of USD -100, i.e. loss of USD 100 and take profit is given the value of USD 200 in the cells C6 and C7 respectively. While the position does not hit either stop loss or take profit, we continue with that trade and ignore all signals that are appearing in column I. Once the trade hits either the stop loss or take profit, we again start looking at the signals in column I and open a new trading position as soon as we’ve Buy or Sell signal in column I.

Column M represents the trading signals based on the input parameters specified. Column I already has trading signals and M tells us about the status of our trading position i.e. are we long or short or booked the profits or exited at the stop loss. If the trade is not exited, we carry forward the position to the next candle by repeating the value of the status column in the previous candle. If the price movement occurs in such a way that it breaches the given TP or SL then we square off our position thus denoting it by “TP” and “SL” respectively.

Column L represents Mark to Market. It specifies the portfolio position at the end of time period. As specified in the input parameters we trade 1 lot of MSCI and 3 lots of Nifty. So when we trade our position is the appropriate price difference (depending on whether we are bought or sold) multiplied by the number of lots.

Column N represents the profit/loss status of the trade. P/L is calculated only when we have squared off our position. Column O calculates the cumulative profit.

Outputs

The output table has some performance metrics tabulated. Loss from all loss making trades is $3699 and profit from trades that hit TP is $9280. So the total P/L is $9280-$3699=$5581. Loss trades are the trades that resulted in losing money on the trading positions. Profitable trades are the successful trades ending in gaining cause. Average profit is the ratio of total profit to the total number of trades. Net average profit is calculated after subtracting the transaction costs which amounts to $91.77.

Now it is your turn!

  • First, download the model
  • Modify the parameters and study the backtesting results
  • Run the model for other historical prices
  • Modify the formula and strategy to add new parameters and indicators! Play with logic! Explore and study!
  • Comment below with your results and suggestions

Click to download the excel (If you have not logged in yet, first login/sign up!)

(more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more