Sentiment Analysis on News Articles using Python

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Know how to perform sentiment analysis on news articles using Python Programming Language

by Milind Paradkar

In our previous post on sentiment analysis we briefly explained sentiment analysis within the context of trading, and also provided a model code in R. The R model was applied on an earnings call conference transcript of an NSE listed company, and the output of the model was compared with the quarterly earnings numbers, and by charting the one-month stock price movement post the earnings call date. QuantInsti also conducted a webinar on “Quantitative Trading Using Sentiment Analysis” where Rajib Ranjan Borah, Director & Co-founder, iRageCapital and QuantInsti, covered important aspects of the topic in detail, and is a must watch for all enthusiast wanting to learn & apply quantitative trading strategies using sentiment analysis.

Taking these initiatives on sentiment analysis forward, in this blog post we attempt to build a Python model to perform sentiment analysis on news articles that are published on a financial markets portal. We will build a basic model to extract the polarity (positive or negative) of the news articles.

In Rajib’s Webinar, one of the slides details the sensitivity of different sectors to company and sectorial news. In the slide, the Pharma sector ranks at the top as the most sensitive sector, and in this blog we will apply our sentiment analysis model on specific news articles pertaining to select Indian Pharma companies. We will determine the polarity, and then check how the market reacted to these news. For our sample model, we have taken ten Indian Pharma companies that make the NIFTY Pharma index.

Building the Model

Now, let us dive straight in and build our model. We use the following Python libraries to build the model:

  • Requests
  • Beautiful Soup
  • Pattern

Step 1: Create a list of the news section URL of the component companies

We identify the component companies of the NIFTY Pharma index, and create a dictionary in python which contains the company names as the keys, while the dictionary values comprise the respective company abbreviation used by the financial portal site to form the news section URL. Using this dictionary we create a python list of the news section URLs for the all components companies.

Step 2: Extract the relevant news articles web-links from the company’s news section page

Using the Python list of the news section URLs, we run a Python For loop which pings the portal with every URL in our Python list. We use the requests.get function from the Python requests library (which is a simple HTTP library). The requests module allows you to send HTTP/1.1 requests. One can add headers, form data, multipart files, and parameters with simple Python dictionaries, and also access the response data in the same way.

The text of the response object is then applied to create a Beautiful Soup object. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with a given parser to provide for ways of navigating, searching, and modifying the parse tree.

HTML parsing basically means taking in the HTML code and extracting relevant information like the title of the page, paragraphs in the page, headings, links, bold text etc.

The news section webpage on the financial portal site contains 20 news articles per page. We target only the first page of the news section, and our objective is to extract the links for all the news articles that appear on the first page using the parsed HTML. We inspect the HTML, and use the find_all method in the code to search for a tag that has the CSS class name as “arial11_summ”. This enables us to extract all the 20 web-links.

Fortunes of the R&D intensive Indian Pharma sector are driven by sales in the US market and by approvals/rejections of new drugs by US Food and Drug Administration (USFDA). Hence, we will select only those news articles pertaining to the US Food and Drug Administration (USFDA) and the US market. Using keywords like “US”, “USA”, and “USFDA” in a If statement which is nested within the Python For Loop, we get us our final list of all the relevant news articles.

Step 3: Remove the duplicate news articles based on news title

It may happen that the financial portal publishes important news articles pertaining to the overall pharma sector on every pharma company’s news section webpage. Hence, it becomes necessary to weed out the duplicate news articles that appear in our Python list before we run our sentiment analysis model. We call the set function on our Python list which we generated in Step 2 to give us a list with no duplicate news articles.

Step 4: Extract the main text from the selected news articles

In this step we run a Python For Loop and for every news article URL, we call the requests.get() on the URL, and then convert the text of response object into a Beautiful Soup object. Finally, we extract the main text using the find and get_text methods from the  Beautiful Soup module.

Step 5: Pre-processing the extracted text

We will use the n-grams function from the Pattern module to pre-process our extracted text. The ngrams() function returns a list of n-grams (i.e., tuples of n successive words) from the given string. Since we are building a simple model, we use a value of one for the n argument in the n-grams function. The Pattern module contains other useful functions for pre-processing like parse, tokenize, tag etc. which can be explored to conduct an in-depth analysis.

Step 6: Compute the Sentiment analysis score using a simple dictionary approach

To compute the overall polarity of a news article we use the dictionary method. In this approach a list of positive/negative words help determine the polarity of a given text. This dictionary is created using the words that are specific to the Pharma sector. The code checks for positive/negative matching words from the dictionary with the processed text from the news article.

Step 7: Create a Python list of model output

 The final output from the model is populated in a Python list. The list contains the URL, positive score and the negative score for each of the selected news articles on which we conducted sentiment analysis.

Final Output

sentiment trading using python

Step 8: Plot NIFTY vs NIFTY Pharma returns

Shown below is a plot of NIFTY vs NIFTY Pharma for the months of October-November 2016. In our NIFTY Pharma plot we have drawn arrows highlighting some of the press releases on which we ran our sentiment analysis model. The impact of the uncertainty regarding the US Presidential election results, and the negative news for the Indian Pharma sector emanating from the US is clearly visible on NIFTY Pharma as it fell substantially from the highs made in late October’2016. Thus, our attempt to gauge the direction of the Pharma Index using the Sentiment analysis model in Python programming language is giving us accurate results (more or less).

sentiment trading using python


Next Step:

One can build more robust sentiment models using other approaches and trade profitably. As a next step we would recommend watching QuantInsti’s webinar on “Quantitative Trading Using Sentiment Analysis” by Rajib Ranjan Borah. Watch it by clicking on the video below:


Also, catch our other exciting Python trading blogs and if you are interested in knowing more about our EPAT course feel free to contact our QuantInsti team by clicking here.

Algorithmic trading course

  • Download.rar
    • Sentiment Analysis of News Article – Python Code
    • dict(1).csv
    • Nifty and Nifty Pharma(1).csv
    • Pharma vs


Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Popular Python trading platform for Algorithmic Trading

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
python trading platform

python trading platform

by Apoorva Singh

In one of our recent articles we’ve talked about most popular backtesting platforms for quantitative trading. Here we are sharing most widely used python trading platform and libraries for quantitative trading.

Python is a free open-source and cross-platform language which has a rich library for almost every task imaginable and specialized research environment. Python is an excellent choice for automated trading when the trading frequency is low/medium, i.e. for trades which do not last less than a few seconds. It has multiple APIs/Libraries that can be linked to make it optimal, cheaper and allow greater exploratory development of multiple trade ideas. (more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

RExcel Tutorial – Leveraging the Power of R in Excel

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

RExcel Tutorial

RExcel Tutorial – Leveraging the Power of R in Excel By Milind Paradkar

How many times has MS Excel given you a hard time while building complex models or importing that extra-large data set into the spreadsheet? As a trader, I would love to see crisp formulas in my worksheets and more importantly, I would want that my models are less prone to errors when I am trading in live market.

What if I tell you that there is a microlith that can tear through these shortcomings and leverage the power of R in Excel in a hassle-free and non-tedious manner?


Friends, let me introduce you to RExcel- an add-in which allows you to use R functionalities on MS Excel. In a nutshell, we can do the following with this R Excel plugin:

  • Use R functions via cell formula/macros
  • Run R scripts through excel
  • Transfer data between R and Excel.

Developed by Erich Neuwirth, RExcel works on Microsoft Windows with Excel 2003, 2007, 2010 and 2013. It uses the statconnDCOM server and the rcom package to access R from Excel.

Before you start using RExcel, you will need the following:

  • A suitable version of R
  • A matching version of rscproxy
  • statconnDCOM or rcom with statconnDCOM

You can find the link to install these in your system at the end of this article along with the download link.

Let’s come back to our tutorial now. There are three ways of using RExcel –

  • Worksheet functions
  • Macro mode
  • Scratchpad mode.

I will illustrate each of these modes with examples.

Using RExcel worksheet functions

 As the name suggests, these functions call R functions in Excel worksheet cells. The list of functions include:

  • RApply
  • RCall
  • REval
  • RExec
  • Other argument modifier functions.

You can refer to the help and documentation link in RExcel help tab to see the complete list of worksheet functions.

How do we use this function?

Let me demonstrate with some examples.

1) Calculating the mean:

Let’s say that I wish to calculate the mean of the OHLC prices for the two stocks using the RApply function.

How do I do that?

I will use RApply which will allow to call any R function as an Excel worksheet function. We call the mean function and apply it over the OHLC prices.

Pretty simple, isn’t it?

RExcel tutorial

2) Defining functions:

Now, if I want to define my custom function and apply it using a given set of arguments, then I can do that as well. In this example, I round off the mean price to the nearest 0.05 paisa using the function shown below. The minimum tick for NSE stocks is Rs. 0.05.

Rexcel macros

3) Applying functions over a range for quick execution:

If we use multiple RApply calls then it slows down the computation considerably. To overcome this we can use Excel array formulas instead of multiple RApply’s, and speed up the computation. The example below illustrates the rounding using the array function. Check the link to learn how array formulas are written in Excel.

Rexcel macros

Method II: Connecting R via Macros

RExcel has provided VBA procedures and functions for us to connect R via Macros. However, prior to starting in Macros you need to set the reference to RExcelVBAlib from the References tab (In the VBA window, see in Tools -> References).

Let me take a couple of examples to illustrate the macro method.

Example: Running an R script and generating the output in excel.

I am going to run an R script called “Top Gainers of the day.R” from the “RunRScript” macro (code shown below). When I execute this, the R script generates a list of top 5 NSE stock gainers of that day.

How does it do this?

It does this by sorting the percentage price change for all the given stocks in a descending order, and stores the top 5 in the “TopGainers_df” dataframe. We will run the macro and print the dataframe in our excel worksheet.

 Rexcel macros

The commands mentioned in the above macro have the following meaning:

  1. The RInterface.StartRServer starts the R server.
  2. The RInterface.RRun executes the command string that follows.
  3. The RInterface.RunRFile executes the R script mentioned in the quotes.
  4. The RInterface.GetDataframe command is used to retrieve the output in Excel. This command takes two arguments, the name of the dataframe variable, and the location in Excel where we want to print the output.
  5. Finally, the StopRServer command stops the R server.

The output printed in Excel upon running the macro is shown below.

RExcel Output from the R script printed in the Excel sheet


How do I call R functions in Macro?

There is another way of using R functions in macros. I have given macro code below where I have stored a list of stocks with their percentage price change on Sheet3 of the workbook.

I have used the RInterface.PutDataframe command to assign this range as a dataframe to R. Then I called the arrange function from the dplyr package, and got the top 5 NSE stock gainers of the day.

Finally, I use the RInterface.GetDataframe to print this dataframe onto sheet2 of the workbook.

Thus, upon running this “Arrange” macro I was able to produce the same result as obtained in the first example.

RExcel Output from the R script printed in the Excel sheet

These macros can be attached to menu items or toolbar items for easy execution. Once again, I will advise you to refer to the help and documentation link in RExcel help tab to see the complete list of procedures and functions available.

Method III: Using the Scratchpad method

In this method I will write the R expressions on an Excel sheet, and execute it using the buttons in the RExcel Menu. One needs to initiate R connection by selecting the “Start R” link from the RExcel Menu.
RExcel Output from the R script printed in the Excel sheet


  1. We select the range (I3:I5) shown below
  2. Then click “Run R” from the RExcel menu
  3. Next, we select an empty cell (M3)
  4. Select “Get R Value” and when prompted, indicate the cell (I5 in this case) containing the final expression.
  5. The output from the expression gets printed in this empty cell.

I just used R’s cbind function and generated the output in Excel.

RExcel Output from the R script printed in the Excel sheet

The scratchpad method can be applied to scalars, vectors, data frames or to a matrix. There are other additional operations that can be done using this method, and you know very well where you can learn these.

To Conclude

Combining the power of R with Excel can surely simplify things for traders using R in Excel, and as a result provide them with more firepower to backtest their strategies and execute them on MS Excel.

Next Step

If you’re a trader interested in learning various aspects of Algorithmic trading, check out the Executive Programme in Algorithmic Trading (EPAT). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. Most of all, the course will surely equip you with the required skillsets to be a successful algo trader.




Sources & References:

For Installation of R, R(D)COM server and RExcel go to:

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

IBPy Tutorial to implement Python in Interactive Brokers API

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

How to implement Python in Interactive Brokers API using IBPy

I hope you had a great time attending our webinar on Trading with Interactive Brokers using Python, I thought it would be very good if give you a brief insight on Interactive Brokers API and using IBPy to implement Python in IB’s TWS. As we proceed Interactive Brokers demo account and IBPy. Towards the end of this article, you will be running a simple order routing program using Interactive Brokers API.

For those of you who are already aware of Interactive Brokers (IB) and its interface, you can very well understand why I prefer IB over other available online brokerages. However, for those who have not used IB, this would be the first question that comes to mind:

Why Interactive Brokers?

When I have online brokerages like Fidelity, Capital One Investing, & Firstrade, then why should one use Interactive Brokers?


Interactive Brokers is my first choice because of 5 simple reasons:

  1. International Investing in more than 100 markets
  2. Commission rates that are highly competitive
  3. Low margin rates
  4. Very friendly user interface
  5. Vast selection of order types

Among the five points mentioned above, the most important and impressive one for any beginner are point no. 2 and 4, isn’t it?. The Interactive Brokers API can be used in a professional context even for those who are completely alien to it. Interactive Broker API’s connectivity with Java, C++ and Python is very impressive as well.

Enough said, it is time to move to the next step. I can understand that most of you must already be eager to test their hand at the Interactive Brokers API panel. After all, nobody could say no to something very friendly that is lucrative as well. You can easily set up your account on Interactive Brokers by going to their website. There is an option wherein you can opt for a free trial package.

Algorithmic traders prefer Interactive Brokers due to its relatively straightforward API. In this article I will be telling you how to automate trades by implementing Python in the Interactive Brokers API using a bridge called IBPy.

Interactive Brokers IBPy

As Interactive Brokers offers a platform to an incredibly wide spectrum of traders, therefore, its GUI consists of a myriad of features. This standalone application is called Trader Workstation or TWS on Interactive Brokers. Apart from the Trader Workstation, Interactive Brokers also has an IB Gateway. This particular application allows IB servers to access it using a Command Line Interface. Algo traders usually prefer using this over GUI.

What is IbPy?

IbPy is a third-party implementation of the API used for accessing the Interactive Brokers on-line trading system. IbPy implements functionality that the Python programmer can use to connect to IB, request stock ticker data, submit orders for stocks and futures, and more.

The purpose of IbPy is to conceive the native API, that is written in Java, in such a way that it can be called from Python. Two of the most significant libraries in IBPy are ib.ext and ib.opt. ib.opt derives from the functionality of ib.ext. Through IBPy, the API executes orders and fetches real-time market data feeds. The architecture essentially utilizes a client-server model.

Implementation of IB in Python

First of all, you must have an Interactive Brokers account and a Python workspace to install IBPy, and thereafter, you can can use it for your coding purposes.

Installing IBPy

As I had mentioned earlier, IBPy is a Python emulator written for the Java-based Interactive Brokers API. IBPy helps in turning the development of algo trading systems in Python into a less cumbersome process. For this reason, I will be using it as a base for all kinds of interaction with the Interactive Brokers TWS. Here I am presuming that you have Python 2.7 installed on your system, else you may download it from here:

Installing On Ubuntu


IBPy can be acquired from GitHub repository.

The following code will be needed on an Ubuntu system:

Creation of subdirectory

Download IBPy

Great! You have installed Python on your Ubuntu system.

Installing IBPy on Windows

Go to the github repository and download the file from:

Unzip the downloaded file. Move this folder to the directory where you have installed Python, so that it can recognize this package:


Now, open the setup with windows command prompt and type the following command: 

After this you will have to get your Trader Workstation (TWS) in operation.

Installing Trader Workstation

Interactive Brokers Trader Workstation or TWS is the GUI that lets all registered users of Interactive Brokers to trade on their systems. Don’t worry, even if you do not have prior knowledge of programming or coding, TWS will let you do the trading work.You can download the TWS installer from interactive Brokers’ website and run it on your system.

You can download the TWS Demo from here:

Important Note

In the older versions of TWS, the user would get to choose two different programs. The first one was the TWS of Interactive Brokers and the second was the IB Gateway, about which I have already talked earlier. Although they are different applications, however, they can only be installed together. 

The IB Gateway runs on lower processing power since it does not have an evolved graphical user interface as the Trade Workstation. However, the results and other data are displayed in the form of primitive codes on the IB Gateway, making it less friendly for certain set of users who do not possess enough knowledge in coding.

You may use either of the two interfaces for your work on interactive Brokers. The functionalities of both remain the same, i.e. to relay info between your system and the Interactive Brokers server. Needless to say, the Python app will get the exact same messages from the server end of Interactive brokers.

Installation Walk-through

Once you download the application, you will find the executable file at the bottom of your browser. Click on Run when prompted with a security warning.

  • Now, click on Next.
  • Click on finish to complete your installation.
  • Click on the desktop icon and start the TWS application.

IBPy implementation in TWS

Since, I am going to use a demo account, therefore, click on No User name?

  • Enter your email address and click on Login:

IBPy implementation in TWS

Configuration of Interactive Brokers Panel

The journey so far has been pretty easy, hasn’t it? It is great, if you agreed with me on that one. After installing the TWS and/or IB Gateway, we have to make some changes in the configurations before implementing our strategies on Interactive Brokers’ servers. The software will connect to the server properly only once these settings are changed.


  • Go to API settings in TWS

Setting preferences for IBPY on Interactive Brokers TWS

Setting preferences for IBPY on Interactive Brokers TWS

  • Check the Enable ActiveX and Socket Clients
  • Set Socket port to unused port.
  • Set the Master API client ID to 100
  • Create a Trusted IP Address and set to

Global preferences for Interactive Brokers API using IBPy

Running the first program

So, all done with the configuration?

Great! We are now ready to run our first program.

Before you start typing in those codes, make sure that you have started TWS (or IB Gateway). Many a times, I get questions as to why we get an error message when the code is run. Like I had mentioned in the previous section, your system is connected to the Interactive Brokers’ server through the TWS or IB Gateway. So, if you haven’t turned it on, then you are bound to get an exception message, no matter how smartly you have developed your code.

Let’s start working on the coding step-by-step.

Open Spyder (Start – All Programs – Anaconda2 – Spyder)

On the Spyder console, I will be entering my codes.

Spyder interface for IBPy

1) We start by importing necessary modules for our code:
Connection is a module that connects the API with IB while message performs the task of a courier between server and the system, it basically retrieves messages from the Interactive Brokers server.
Just like every transaction in the real-world involves some kind of a contract or agreement, we have Contract here as well. All orders on Interactive Brokers are made using contract.
2) Making the contract function
The contract function has the following parameters:

  • Symbol
  • Security Type
  • Exchange
  • Primary Exchange
  • Currency

The values to these parameters must be set accordingly.

3) Setting the Order Function

Order function allows us to make orders of different types. The order function has the following parameters:

  • Order Type
  • Total Quantity
  • Market Action (Buy or sell)

Considering that our order does have a set price, we code it in the following way:

The conditional statement will now set up the order  as a simpler market order without any set price. 

The client id & port should be the same as you had set in the Global preferences

4) Initiating Connection to API

Establish connection to TWS.


Assign error handling function.


Assign server messages handling function.


Create AAPL contract and send order

In the above line, AAPL is Apple and STK is the name of the security type. The exchange and primary exchange has been set to SMART. When we set these two parameters to SMART, then we are actually using Interactive Brokers’s smart routing system which enables the algo to find the best route to carry out the trade. And of course, the currency has been set to USD.

We wish to sell100 shares of AAPL

Our order is to sell 1 stocks and our price is $100.

We have placed order on IB TWS with the following parameters:

  • Order id
  • Contract
  • offer

“Always remember that the order id should be unique.”

5) Disconnecting

And finally you need to disconnect:      

Yes, you are done with your first order on Interactive brokers’ API using basic Python coding. Keep in mind that the demo account that you are using might not give you as much privileges as a paid account.

Running the Code

Click on the Green colored ‘Play’ button or simply press F5 in Spyder. On your TWS Demo system, you will get a popup regarding your order. Click on OK.

IBPy confirmation


You can see the final output on the bottom right side of Interactive Brokers TWS panel.

Output of algo using IBPy

Just in case you want to have a look at the complete code at one go, here it is:

Next Step

I am sure that you have all run your code and made your first transaction using Interactive Brokers API and IBPy. We can see the output on the TWS where you will be selling 100 shares of APPLE. This is a very generic and simple type automated execution using Interactive Brokers API.

You can watch QuantInsti’s webinar on Trading with Interactive Brokers using Python, where Dr. Hui Liu has explained how to use another wrapper called IBridgePy. Dr Hui Liu is one of the pioneers in the field. So, if you wish to know how you can implement Algo strategies in live market using Python on Interactive Broker’s API, then you should definitely checkout the videos from our recently concluded webinar. To know more about algo trading, enrol for EPAT.



Sources & References:,,

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

How to Check Data Quality using R

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+


How to check data quality

By Milind Paradkar

Do You Use Clean Data?

Always go for clean data! Why is it that experienced traders/authors stress this point in their trading articles/books so often? As a novice trader, you might be using the freely available data from sources like Google or Yahoo finance. Do such sources provide accurate, quality data?

We decided to do a quick check and took a sample of 143 stocks listed on the National Stock Exchange of India Ltd (NSE). For these stocks, we downloaded the 1-minute intraday data for the period 1/08/2016 – 19/08/2016. The aim was to check whether Google finance captured every 1-minute bar during this period for each of the 143 stocks.

NSE’s trading session starts at 9:15 am and ends at 15:30 pm IST, thus comprising of 375 minutes. For 14 trading sessions, we should have 5250 data points for each of these stocks. We wrote a simple code in R to perform the check.

Here is our finding. Out of the 143 stocks scanned, 89 stocks had data points less than 5250, that’s more than 60% of our sample set!! The table shown below lists downs 10 such stocks from those 89 stocks.


Let’s take the case of PAGEIND. Google finance has captured only 4348 1-minute data points for the stock, thus missing 902 points!!

Example – Missing the 1306 minute bar on 20160801:

Missing the 1306 minute bar on 20160801

Example – Missing the 1032 minute bar on 20160802:

Missing the 1032 minute bar on 20160802

If a trader is running an intraday strategy which generates buy/sell signals based on 1-minute bars, the strategy is bound to give some false signals.

As can be seen from the quick check above, data quality from free sources or from cheap data vendors is not always guaranteed. Many of the cheap data vendors source the data from Yahoo finance and provide it to their clients. Poor data feed is a big issue faced by many traders and you will find many traders complaining about the same on various trading forums.

Backtesting a trading strategy using such data will give false results. If are using the data in live trading and in case there is a server problem with Google or Yahoo finance, it will lead to a delay in the data feed. As a trader, you don’t want to be in a position where you have an open trade, and the data feed stops or is delayed. When trading with real money, one is always advised to use quality data from reliable data vendors. After all, Data is Everything!

Next Step

If you’re a retail trader interested in learning various aspects of Algorithmic trading, check out the Executive Programme in Algorithmic Trading (EPAT). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. The course equips you with the required skillsets to be a successful trader.

Download Data Files

  • Do You Use Clean Data.rar
    • 15 Day Intraday Historical
    • F&O Stock List.csv
    • R code – Google_Data_Quality_Check.txt
    • R code – Stock price data.txt


Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Vectorised Backtesting in Excel

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+

Backtesting in Excel

By Jacques Joubert

Now for those of you who know me as a blogger might find this post a little unorthodox to my traditional style of writing, however in the spirit of evolution, inspired by a friend of mine Stuart Reid (, I will be following some of the tips suggested in the following blog post.

Being a student in the EPAT program I was excited to learn the methodology that others make use of when it comes to backtesting. As usual, we start off in Excel and then migrate to R.

Having previously written a blog series on backtesting on Excel and then moving to R, I was very interested to see a slightly different method used by the QuantInsti team. (more…)

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Importing CSV Data in Zipline for Backtesting

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Importing CSV Data in Zipline for Backtesting

Importing CSV Data in Zipline for Backtesting

By Priyanka Sah

In our previous article on Introduction to Zipline package in Python, we created an algorithm for moving crossover strategy.

Recall, Zipline is a Python library for trading applications and to create an event-driven system that can support both backtesting and live-trading.

In the previous article, we learnt how to implement Moving Average Crossover strategy on Zipline. The strategy code in Zipline reads data from Yahoo directly, performs the backtest and plots the results.

We recommend that you brush up a few essential concepts, covered in the previous post, before going further:

  1. Installation (how to install Zipline on local)
  2. Structure (format to write a code in Zipline)

In this article, we will take a step further and learn to backtest on Zipline using data from different sources. We will learn to:

  • Import and backtest on OHLC data in CSV format
  • Import and use data from Google Finance for research/analysis
  • Calculate and print backtesting results such as PnL, number of trades, etc

Become an algotrader. learn EPAT for algorithmic trading

The post serves as a guide for serious quants and DIY Algo traders who want to make use of Python or Zipline packages independently for backtesting and hypothesis testing of their trading ideas. In this post, we will assume that the data is from the US markets. It is possible to use other markets’ data sets for analysis with some edits and additions in the code. We will share the same in a later post.

The Parts of the code on Zipline – what we have learnt already

Part 1 - Code screenshot

Problem with existing method?

Zipline provides an inbuilt function “loads_bars_from_yahoo()” that fetches data from Yahoo in given range and uses that data for all the calculations. Though very easy to use, this function only works with Yahoo data. Using this function, we cannot backtest on different data sets such as

  1. Commodities data – yahoo does not provide
  2. Simulated data sets created and saved in csv format

We have been using this in-built function so far to load stock data in Python IDE and work further with it. To be able to read csv or any other data type in Zipline, we need to understand how Zipline works and why usual methods to import data do not work here!

Zipline accepts the data in panel form. To understand how Zipline treats and understands data, we must learn a little bit about data structures in Python.

Data Structures in Panda

Pandas structures data in three forms essentially: Series (1D), Data Frame (2D), Panel (3D)

  1. Series:

It is a one-dimensional labelled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

The basic method to create a Series is to call:

s = pd.Series(data, index=index)

A series accepts different kinds of objects such as a Python dict, an ndarray, a scalar value (like 5).

  1. Data Frame:

It is a two-dimensional labelled data structure with rows and columns. Columns can be of different types or same.

It is one of the most commonly used pandas objects and accepts different types of inputs such as Dict of 1D ndarrays, lists, dicts, or Series; 2-D numpy.ndarray; Structured or record ndarray; a Series.

  1. Panel:

Panel is a lesser used data structure but can be efficiently used for three-dimensional data.

The three axes are named as below:

  1. items: axis 0, each item corresponds to a DataFrame contained inside
  2. major_axis: axis 1, it is the index (rows) of each of the DataFrames
  3. minor_axis: axis 2, it is the columns of each of the DataFrame

Zipline only understands data structure in Panel format.

While it is easy to import .csv data in Panda as a dataframe, it is not possible to do the same in Zipline directly. However, we have found a roundabout to this problem:


This is a powerful technique which will help you in importing data from different sources such as:

  • Import OHLC data in a CSV format in zipline (we will show how)
  • Read data from online sources other than Yahoo which connect with Panda (we will show how)
  • Read data from Quandl in Zipline (this is left as an exercise for you!)

Let us get started with the three steps!

  1. Import the data in python

We can use any method to import the data as a Dataframe or just import the data and convert it into a Dataframe. Here, we will use two methods to fetch data: DataReader & read_csv function.

Use DataReader to read data from Google

Pandas provide a function Datareader which allows user to specify the date range and the source.  You can use Yahoo, Google or any other data source.

This is how a DataFrame looks like, when you print the first 6 rows:


Use read_csv function to import a CSV file

Pandas provide another function read_csv that fetches the csv file from a specified location. Please note that the CSV should be in a proper format so that it runs in a correct fashion when called by a strategy algorithm in Zipline.

Format of CSV file:

First column is “Date”, second column is “Open”, third column is “High”, fourth column is “Low”, fifth column is “Close” and sixth column is “Volume” and seventh column is “Adj Close”. None of the columns should be blank or with missing values.

Reading CSV file:

Note in the code above:

Name of the stock is “SPY”
We are already in the directory where the CSV file “SPY.csv” is saved, else you need to specify the path as well.

  1. Convert DataFrame to Panel

The data imported in Python IDE by aforementioned methods is saved as a Dataframe. Now we need to convert it into Panel format and modify major and minor axis.

Zipline accepts [‘Open’, ‘High’, ‘Low’, ‘Close’, ‘Volume’, ‘Price’] data as minor axis and ‘Date’ as major axis in UTC time format. Since if we did not have the Date in UTC format, we convert it by using “tz_localize(pytz.utc)”.

Now ‘panel’ is the dataset ‘data’ saved in panel format. This is how a Panel format looks like:

Panel format

  1. Use this new data structure Panel to run your strategy

We use this new data structure ‘Panel’ to run our strategy with no changes in the “initialize” or “handle_data” sections. The strategy logic and code remains the same. We just plug the new data structure while running the strategy.

That’s it! Now you can easily run the previously explained Moving Crossover strategy on a CSV data file! Go on, give it a try!

You can fetch the Quandl(US data) data, and try generating signals on the same.

Backtesting on Zipline

In the previous post, we backtested a simple Moving Crossover strategy and plotted cash and PnL for each trading day. Now, we will calculate PnL and total number of trades for the entire trading period.

Recall that the results are automatically saved in ‘perf_manual’. Using the same, we can calculate any performance ratios or numbers that we need.

Looks like this strategy lost more than 50% of initial capital

Looks like this strategy lost more than 50% of initial capital!

To change the initial capital and other parameters to optimize your backtesting results, you need to initialize the TradingAlgorithm() accordingly. ‘capital_base’ is used to define the initial cash, ‘data_frequency’ is used to define the data frequency. For example:

(By default the capital is 100000.0.)

Go through the official documentation of TradingAlgorithm() function to try and learn more!

Next Step

If you are serious about writing advanced trading strategies and executing them through Python, read more about our Executive Programme in Algorithmic Trading. Over 250 hours of intensive training, with customized learning solutions, interactions with industry experts, traders, quants and two months of practical project work under Algo & HFT traders is what you get at throw-away prices! The new batch is starting from 27th August! Enroll now!

Download Data File

  • mac_excel_ zipline.txt


Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Introduction to Zipline in Python

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Zipline in Python

Introduction to Zipline in Python

By Priyanka Sah


Python has emerged as one of the most popular language for programmers in financial trading, due to its ease of availability, user-friendliness and presence of sufficient scientific libraries like Pandas, NumPy, PyAlgoTrade, Pybacktest and more.

Python serves as an excellent choice for automated trading when the trading frequency is low/medium, i.e. for trades which do not last less than a few seconds. It has multiple APIs/Libraries that can be linked to make it optimal, cheaper and allow greater exploratory development of multiple trade ideas.

Become an algotrader. learn EPAT for algorithmic trading

It is due to these reasons that Python has a very interactive online community of users, who share, reshare, and critically review each other’s work or codes. The two current popular web-based backtesting systems are Quantopian and QuantConnect.

Quantopian makes use of Python (and Zipline) while QuantConnect utilises C#. Both provide a wealth of historical data. Quantopian currently supports live trading with Interactive Brokers, while QuantConnect is working towards live trading.

Zipline is a Python library for trading applications that powers the Quantopian service mentioned above. It is an event-driven system that supports both backtesting and live-trading.

In this article we will learn how to install Zipline and then how to implement Moving Average Crossover strategy and calculate P&L, Portfolio value etc.

This article is divided into the following four sections:

  • Benefits of Zipline
  • Installation (how to install Zipline on local)
  • Structure (format to write code in Zipline),
  • Coding Moving average crossover strategy with Zipline

Benefits of Zipline

  • Ease of use
  • Zipline comes “batteries included” as many common statistics like moving average and linear regression can be readily accessed from within a user-written algorithm.
  • Input of historical data and output of performance statistics are based on Pandas DataFrames to integrate nicely into the existing PyData ecosystem
  • Statistic and machine learning libraries like matplotlib, scipy, statsmodels, and sklearn support development, analysis, and visualization of state-of-the-art trading systems


Using pip

Assuming you have all required non-Python dependencies, you can install Zipline with pip via:

Using conda

Another way to install Zipline is via the conda package manager, which comes as part of Anaconda or can be installed via pip install conda.

Once setup, you can install Zipline from our Quantopian channel:


Basic structure

Zipline provide a particular structure to code which includes defining few functions that runs the algorithms over a dataset as mentioned below.

So, first we first have to import some functions we would need to use in the code. Every Zipline algorithm consists of two functions you have to define:

* initialize(context) and * handle_data(context, data)

Before the start of the algorithm, Zipline calls the initialize() function and passes in a context variable. Context is a global variable that provide you to store variables you need to access from one algorithm iteration to the next.

After the algorithm has been initialized, Zipline calls the handle_data() function once for each event. At every call, it passes the same context variable and an event-frame called data containing the current trading bar with open, high, low, and close (OHLC) prices as well as volume for each stock.

All functions commonly used in the algorithm can be found in Zipline.api module. Here we are using order(arg1, arg2) that takes two arguments: a security object, and a number specifying how many stocks you would like to order (if negative, order() will sell/short stocks). In this case we want to order 10 shares of Apple at each iteration.

Now, the second method record() allows you to save the value of a variable at each iteration. You provide it with a name for the variable together with the variable itself. After the algorithm finished running you can all the variables you recorded, we will learn how to do that.

To run the algorithm, you would need to call TradingAlgorithm() that uses two arguments: initialize function, and handle_data.  Then, call run method using data as argument on which algorithm will run (data is panda data frame that stores the stocks prices)

run() first calls the initialize() function, and then streams the historical stock price day-by-day through handle_data(). After each call to handle_data() we instruct Zipline to order 10 stocks of AAPL.

How to code Moving average crossover strategy with Zipline

Moving Averages

It is the simple average of a security over a defined number of time periods.


Moving average crossovers are a common way traders can use Moving Averages. A crossover occurs when a faster Moving Average (i.e. a shorter period Moving Average) crosses either above a slower Moving Average (i.e. a longer period Moving Average) which is considered a bullish crossover or below which is considered a bearish crossover.

Now we will learn how to implement this strategy using Zipline. To import libraries and initialize variables that will be used in algorithm.

The code is divided into 5 parts

  • Initialization
  • Initialize method
  • handle_data method
  • Strategy logic
  • Run Algo



load_bars_from_yahoo() is the function that takes stock and time period for which you want to fetch the data. here i am using SPY stocks between 2011 to 2012, you can change this according to you.

Initialize method

Now we would define initialize function, represents the stock that we are dealing with, in our case its SPY.

handle_data method

handle_data() contains all the operation we want to do, the main code for out aldorithm. we need to calculate moving averages for diffrent windows, Zipline gives an inbulit function mavg() that takes an integer to define the window size.

Also, Zipline automatically calculates current_price, portfolio_value etc. we can just call the variables, in this algo i have calculated current_positions, price, cash, portfolio_value, pnl.

Strategy logic

Now the logic that will place the order for buy or sell depending upon the condition that compares moving averages.

  1. If short moving average is greater than longer one and your current_positions is 0 then you need to calcuate the no of shares and place an order
  2. If short moving average is smaller than longer one and your current_positions is not 0 then you need to sell all the shares that you have currently.
  3. Third condition is if nothing satisfies then do nothing just record the variables you need to save.


For running this algo, you need the following code:


You can plot the graph also using method plot()

Graph for the strategy

graph of moving crossover strategy using zipline

Snapshot of the screen using Zipline

Snapshot of screen in Zipline


We hope that you found this introduction to zipline and implementing a strategy using the same useful. In our next article we will show you how to import and backtest data in CSV format using Zipline. For building technical indicators using python, here are few examples.

If you are a coder or a tech professional looking to start your own automated trading desk. Learn automated trading from live Interactive lectures by daily-practitioners. Executive Programme in Algorithmic Trading covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. Enroll now!

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Automated Trading on Oanda platform by Dr. Yves Hilpisch

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
oanda platform automated trading

Automated trading on oanda platform by Dr. Yves Hilpisch

Python has emerged as one of the most popular language to code in Algorithmic Trading, owing to its ease of installation, free usage, easy structure, and availability of variety of modules. Globally, Algo Traders and researchers in Quant are extensively using Python for prototyping, backtesting, building their proprietary risk and order management system as well as in optimisation of testing modules.

This blog post highlights some of the key steps involved in Algorithmic Trading using Python as the programming language. The screenshots are taken from the webinar of Dr. Yves Hilpisch, in collaboration with QuantInsti. Dr. Hilpisch is world-renowned authority in the world of Python and is the founder of Python Quants GmbH, with several books on the subject under his belt. He also serves as a faculty at QuantInsti, one of Asia’s pioneer education training firm in Algorithmic Trading.

All examples shown are based on the platform and API of Background information about Python and the libraries used can be found in the O’Reilly book Hilpisch, Yves (2014): “Python for Finance – Analyze Big Financial Data”. The post is divided in two parts. The current post highlights the basics of connecting with the Oanda platform using python and backtesting the trading strategies. The next post will cover working with streaming data as well as automated trading in real-time.


Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more

Write Covered Call Strategy in Python

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+


Traders in the derivative market often exercise one of the following: Call option or Put Option.

“Call option” is a financial contract between a buyer and seller, whereby the buyer has the right, but not the obligation, to buy an agreed quantity of a financial instrument from the seller of the option at a certain time for a certain price (the strike price). The “Pull Option” serves the opposite.

In a “Covered Call”, the seller of the call options owns the corresponding amount of the underlying instrument.

A Covered Call is an income generating option strategy which involves two legs:

  • Buying a stock
  • Selling an Out of the money (OTM) call option

If the call is sold simultaneously along with the stock purchase, the strategy is referred to as a “buy-write” strategy.

In a Covered Call, the trader holds a neutral to a bullish outlook. Covered Call is a net debit transaction because you pay for the stock and receive a small premium for the call option sold.

The idea of this blog post is to elaborate on the covered call strategy by an example, and to plot its payoff using Python. The post also highlights “Calendar Call” as it is a modification of the Covered Call strategy.


Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+
Read more