## Essential Books on Algorithmic Trading

**How do I learn algorithmic trading?**

** What are the steps to start Algo trading?**

** What is the best-recommended starter guide, or book, for Algo Trading?** (more…)

**How do I learn algorithmic trading?**

** What are the steps to start Algo trading?**

** What is the best-recommended starter guide, or book, for Algo Trading?** (more…)

Want to secure a Quant or Trader’s job? Following are the areas that you should focus on to get your dream job.

**1) Equity Derivatives/Options **

Derivatives are highly traded instruments. Knowledge of option pricing models, greeks, volatility, hedging, and various option strategies are a must.

**2) Programming**

Sound programming skills are required for backtesting, writing low latency and super-efficient codes.

**3) Statistics & Probability**

Probability and statistics form a key part of trading. Basic statistics, time series, multivariate analysis etc. is used for formulating strategies, and risk-management.

**4) Markets and the Economy**

Good knowledge of how markets and economy work.

**5) Numerical & Brain Teasers**

Numerical and thinking questions test the ability to work out the answer with sound reasoning.

**6) Question about You**

These are asked to determine if you are a good fit for the job.

**7) Job Awareness Questions**

Job Awareness questions evaluate your understanding of the job profile.

**Get Free Career Advice from a leading HFT firm’s Head of Technology and a panel of quants/traders on 25th Jan at 6 PM IST by registering here: **http://bit.do/algowebinar

- Different types of roles and jobs in the Quant/Algo trading space
- What are the skill sets required to become an Algo trader?
- What does a quant developer do?
- How to get hired as a developer in an HFT firm?
- What are the questions asked in an interview for a Quant/trader role?
- Points to keep in mind while building your team for your Algo trading desk.

This webinar offers a unique chance for attendees to interact with a team of Quants & HFT developers on a one-to-one level and ask career-related queries you might have.

Share your career related questions and we will try our best to take them up in the webinar!

*Head of Technology – iRageCapital Broking, Mumbai (a leading HFT firm in Asia)*

Sunith is an expert in the fields of evolutionary algorithms and unconventional models of computing. During his bachelors program in Computer Science at IIT Madras, Sunith was involved with some path breaking research in protein computing and dna computing. His work has been presented at ‘Symposium of Unconventional Models of Computing’.

He then went on to work with Yahoo R&D where he designed some really large scalable platforms. He also has two patent applications pending.

*Director – Master Trust, a leading Brokerage house in India*

Mr. Puneet Singhania M.B.A., C.F.A. serves as a Whole Time Director of Master Capital Services Limited. Mr. Singhania is involved in new initiatives in the group and assists other Directors in corporate strategy.

Prior to joining Master Trust Limited, he was working with ING Investment Management in India in their equity fund management department.

*Financial Engineer and Risk Manager – Futures Business Development, Jeddah (a leading IT company in Saudi)*

Gopinath Ramkumar is data analyst with vast experience in different domains including IT sector and Algorithmic Trading. A trained engineer, Gopinath started his career as a software engineer before joining Manchester Business School for a degree in Computational Finance. While in the UK, he started pursuing his certification in Algorithmic Trading with QuantInsti. On the successful completion of EPAT™, with the assistance of QuantInsti’s placement team, Gopinath join Motilal Oswal, which has its Algorithmic trading prop desk in India.

A quant analyst Gopinath has been contributing to different sectors and fields with his quantitative knowledge and background.

This webinar will be very beneficial for the job seekers in high frequency trading jobs, quant jobs, and algorithmic trading jobs. The session will be ideal for:

- Job seekers in Algo/Quant/HFT domain
- Entrepreneurs who are building their teams for Algo/HFT desks
- Existing Algo/quant traders and developers who are looking for professional growth

Quant Analysts

iRageCapital leverages its strengths in technology and quantitative finance to design cutting edge high-frequency trading systems and strategies. The introduction of DMA in the Indian markets in 2008 opened a multitude of possibilities within the domain of algorithmic trading in India. iRageCapital was formed by quantitative trading professionals in 2009 to explore possibilities within this domain.

By Dyutiman Das

**This article is the final project submitted by the author as a part of his coursework in**** ****Executive Programme in Algorithmic Trading (EPAT™)**** ****at QuantInsti. Do check our Projects ****page**** and have a look at what our students are building.**

Some stocks move in tandem because the same market events affect their prices. However, idiosyncratic noise might make them temporarily deviate from the usual pattern and a trader could take advantage of this apparent deviation with the expectation that the stocks will eventually return to their long term relationship. Two stocks with such a relationship form a “pair”. We have talked about the statistics behind pairs trading in a previous article.

This article describes a trading strategy based on such stock pairs. The rest of the article is organized as follows. We will be talking about the basics of trading an individual pair, the overall strategy that chooses which pairs to trade and present some preliminary results. In the end, we will describe possible strategies for improving the results.

**Pair trading**

Let us consider two stocks, x and y, such that

**y = \alpha + \beta x + e**

**\alpha** and **\beta** are constants and **e** is white noise. The parameters {\**alpha, \beta**} could be obtained from a linear regression of prices of the two stocks with the resulting spread** **

**e_{t} = y_{t} – (\alpha + \beta x_{t})**

Let the standard deviation of this spread be \sigma_{t}. The z-score of this spread is

**z_{t} = e_{t}/\sigma_{t}**

The trading strategy is that when the **z-score is above a threshold**, say 2, **the spread can be shorted**, i.e. sell 1 unit of y and buy \beta units of x. we expect that the relationship between x and y will hold in the future and eventually the z-score will come down to zero and even go negative and then the position could be closed. By selling the spread when it is high and closing out the position when it is low, the strategy hopes to be statistically profitable. Conversely, if the z-score is below a lower threshold say -2, the strategy will go long the spread, i.e. buy 1 unit of y and sell \beta units of x and when the z score rises to zero or above the position can be closed realizing a profit.

There are a couple of issues which make this simple strategy difficult to implement in practice:

- The constants \alpha and \beta are not constants in practice and vary over time. They are not market observables and hence have to be estimated with some estimates being more profitable than others.
- The long term relationship can break down, the spread can move from one equilibrium to another such that the changing {\alpha,\beta} gives an “open short” signal and the spread keeps rising to a new equilibrium such that when the “close long” signal come the spread is above the entry value resulting in a loss.

Both of these facts are unavoidable and the strategy has to account for them.

The parameters {\alpha, \beta} can be estimated from the intercept and slope of a linear regression of the prices of y against the prices of x. Note that linear regression is not reversible, i.e. the parameters are not the inverse of regressing x against y. So the pairs (x,y) is not the same as (y,x). While most authors use ordinary least squares regression, some use total least squares since they assume that the prices have some intraday noise as well. However, the main issue with this approach is that we have to pick an arbitrary lookback window.

In this paper, we have used Kalman filter which is related to an exponential moving average. This is an adaptive filter which updates itself iteratively and produces \alpha, \beta, e and \sigma simultaneously. We use the python package pykalman which has the EM method that calibrates the covariance matrices over the training period.

Another question that comes up is whether to regress prices or returns. The latter strategy requires holding equal dollar amount in both long and short positions, i.e. the portfolio would have to be rebalanced every day increasing transaction cost, slippage, and bid/ask spread. Hence we have chosen to use prices which is justified in the next subsection.

The stability of the long term relationship is determined by determining if the pairs are co-integrated. Note that even if the pairs are not co-integrated outright, they might be for the proper choice of the leverage ratio. Once the parameters have been estimated as above, the spread time series e_{t} is tested for stationarity by the augmented Dickey Fuller (ADF) test. In python, we obtain this from the adfuller function in the statsmodels module. The result gives the t-statistics for different confidence levels. We found that not many pairs were being chosen at the 1% confidence level, so we chose 10% as our threshold.

One drawback is that to perform the ADF test we have to choose a lookback period which reintroduces the parameter we avoided using the Kalman filter.

The trading strategy deploys an initial amount of capital. To diversify the investment five sectors will be chosen: financials, biotechnology, automotive etc. A training period will be chosen and the capital allocated to each sector is decided based on a minimum variance portfolio approach. Apart from the initial investment, each sector is traded independently and hence the discussion below is limited to a single sector, namely financials.

Within the financial sector, we choose about n = 47 names based on large market capitalization. We are looking for stocks with high liquidity, small bid/ask spread, ability to short the stocks etc. Once the stock universe is defined we can form n (n-1) pairs, since as mentioned above (x,y) is not the same as (y,x). In our financial portfolio, we would like to maintain up to five pairs at any given time. On any day that we want to enter into a position (for example the starting date) we run a screen on all the n (n-1) pairs and select the top pair(s) according to some criteria some of which are discussed next.

For each pair, the signal is obtained from the Kalman filter and we check if |e| > nz \sigma, where nz is the z-score threshold to be optimized. This ensures that this pair has an entry point. We perform this test first since this is inexpensive. If the pair has an entry point, then we choose a lookback period and perform the ADF test.

The main goal of this procedure is not only to determine the list of pairs which meets the standards but rank them according to some metrics which relates to the expected profitability of the pairs.

Once the ranking is done we enter into the positions corresponding to the top pairs until we have a total of five pairs in our portfolio.

In the following, we calibrated the Kalman filter over Cal11 and then used the calibrated parameters to trade in Cal12. In the following, we kept only one stock-pair in the portfolio.

In the tests shown we kept the maximum allowed drawdown per trade to 9%, but allowed a maximum loss of 6% in one strategy and only 1% in the other. As we see from above the performance improves with the tightening of the maximum allowed loss per trade. The Sharpe ratio (assuming zero index) was 0.64 and 0.81 respectively while the total P&L was 9.14% and 14%.

The thresholds were chosen based on the simulation in the training period.

- Develop better screening criterion to identify the pairs with the best potentials. I already have several ideas and this will be ongoing research.
- Optimize the lookback window and the buy/sell Z-score thresholds.
- Gather more detailed statistics in the training period. At present, I am gathering statistics of only the top 5 (based on my selection criteria). However, in future, I should record statistics of all pairs that pass. This will indicate which trades are most profitable.
- In the training period, I am measuring profitability by the total P&L of the trade, from entry till the exit signal is reached. However, I should also record max profit so that I could determine an earlier exit threshold.
- Run the simulation for several years, i.e. calibrate one year and then test the next year. This will generate several year’s worths of out-of-sample tests. Another window to optimize is the length of the training period and how frequently the Kalman filter has to be recalibrated.
- Expand the methodology to other sectors beyond financials.
- Explore other filters instead of just Kalman filter.

If you are a coder or a tech professional looking to start your own automated trading desk. Learn automated trading from live Interactive lectures by daily-practitioners. Executive Programme in Algorithmic Trading (EPAT™) covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. Enroll now!

**Note:**

The work presented in the article has been developed by the author, Mr. Dyutiman Das. The underlying codes which form the basis for this article are not being shared with the readers. For readers who are interested in further readings on implementing pairs trading using Kalman Filter, please find the article below.

Link: Statistical Arbitrage Using the Kalman Filter by Jonathan Kinlay

The performance of a trading strategy is measured with a set of parameters. For example, if you are trading in equity then your returns are compared against the benchmark index. The consistency of returns of the strategy also proves to be a significant factor. (more…)

The year 2016 has been exciting on many fronts for QuantInsti^{TM}. One of Asia’s pioneer Algorithmic Trading Research and Training Institute, QuantInsti^{TM} celebrated its 6^{th} anniversary in 2016. Here’s a quick snapshot of our achievements during the bygone year. (more…)

Market impact cost, a very important component of trading costs get closely tracked by portfolio managers as it can make or break a fund’s performance. In this post, we will throw some light on market impact cost, and identify its sources and the different means adopted by portfolio managers to mitigate it. (more…)

As 2016 nears its finish line, here we are with the list of recommended reading on our blog with the top-rated blog posts, as voted by you! Enjoy the last few days doing what you love most! Read on.

This one is straight out of a lecture in the curriculum of QuantInsti’s Executive Programme in Algorithmic Trading (EPAT™). It compares the traditional trading structure with algorithmic trading architecture and highlights the complexities in the latter. The post explains the three core components of the trading server: Complex Event Processing Engine (the brain), Order Management System (the limbs) and the Data Storage component. Life Cycle of the entire system is also explained so that the readers under what happens when a data package is received from the exchange, where trading decisions happen, how risk is monitored and how are orders managed.

- List of available platforms in C/C++/R/Python/Matlab
- Platforms and libraries for programming in Python

There are many platforms out there and for beginners, it is often confusing to pick the most relevant for them. The posts list out the USPs of available platforms so that you can make an informed choice before you start using a platform for backtesting. It is important to make this decision carefully as you would require to spend enough time on one platform to get comfortable with it!

In this highly insightful article, QuantInsti’s EPAT™ graduate, Jacques Joubert shares his project work on Statistical Arbitrage in R programming language. For readers who are more comfortable in Excel, they can download a pair trading model in Excel here to get started. He talks briefly about the history of Statistical Arbitrage before moving on to the strategy and its markdown in R programming language.

What are the different Algo Trading Strategies? What are the strategy paradigms and modelling ideas associated with each strategy? How do we build an Algo trading strategy? These are some of the key questions answered in this in-depth article. QuantInsti’s article on Algorithic Trading Strategies covers the following:

- Momentum based strategies
- Arbitrage
- Statistical Arbitrage
- Market Making
- Machine Learning Based

- Introduction to Zipline, an open-sourced platform for US Stocks
- Learn to build technical indicators in python
- Benefits of learning python as a trading tool

Python has sufficed as one of the most popular programming languages for algorithmic traders. In this set of articles, we have talked about Zipline, building technical indicators and the benefits of learning Python for trading. The articles came into light during the webinar on Automated trading using Python conducted by Dr. Yves Hilpisch. This year, we also had Dr. Hui Liu conducting a webinar on implementing Python in Interactive Broker’s C++ based API. Both Dr. Yves and Dr. Hui, who are two of the renowned names in the field of automated trading, have joined QuantInsti’s impressive line-up of outstanding faculty for EPAT™.

- Free Resources to get started
- An open sourced machine learning strategy with cloud based automation
- A downloadable strategy model in ML to trade in forex markets

Machine Learning and Artificial Intelligence are the most sought-after streams of technology in this era. As trading has become automated, Machine Learning’s importance has only become critical for maintaining competency in the market. From fetching historical information to placing orders to buy or sell in the market, machine learning is an integral part of Automated trading and we have covered it in detail on our blog.

As Algorithmic trading picks up pace in India, more and more conventional traders and beginners are wanting to know about this lucrative field. However, owing to shortage of resources in the market, QuantInsti decided to churn out a very primitive article for amateurs who want to step out in the world of algorithmic trading. Explained in basic language, this article covers all the things one needs to know before starting algorithmic trading.

We would love to hear from you – why you liked any or all. If you would like to read something specific in 2017, all suggestions are welcome!

As the race to zero latency continues, high frequency data, a key component in HFT remains under the scanner of researchers and quants across markets. Beginners to algorithmic trading often find the words high frequency trading (HFT), latency, market microstructure, noise etc. being tossed around on numerous algorithmic trading sites, in research papers, and quant literature. This post aims to unravel some of these terms for our readers. In this post, we will take a brief overview of the features of high frequency data, some of which include:

- Irregular time intervals between observations
- Market microstructure noise
- Non-normal asset return distributions (e.g. fat tail distributions)
- Volatility clustering and long memory in absolute values of returns
- High computations loads and related “Big data” problems

On any given trading day, liquid markets generate thousands of ticks which form the high frequency data. By nature, this data is irregularly spaced in time and is humongous compared to the regularly spaced end-of-the-day (EOD) data.

** **High frequency trading (HFT) involves analyzing this data for formulating trading strategies which are implemented with very low latencies. As such it becomes very essential for mathematical tools and models to incorporate the features of high frequency data such as irregular time series and some others that we will outline below to arrive at the right trading decisions. Let us cover some of the other features that define high frequency data.

Market Microstructure Noise is a phenomenon observed with high frequency data that relates to observed deviation of the price from the base price. The presence of Noise makes high frequency estimates of some parameters like realized volatility very unstable. Noise in high frequency data can result from various factors including:

- Bid-Ask Bounce
- Asymmetry of information
- Discreteness of price changes
- Order arrival latency

Let us look at the concept of Bid-Ask Bounce, which is one of the causes of Noise.

**Bid-Ask bounce – **Bid-Ask bounce occurs when the price for a stock keeps changing from the bid price to ask price (or vice versa). The stock price movement takes place only inside the bid-ask spread, which gives rise to the bounce effect. This occurrence of bid-ask bounce gives rise to high volatility readings even if the price stays within the bid-ask window.

** **

High frequency data exhibit fat tail distributions. To understand fat tails we need to first understand a normal distribution. A normal distribution assumes that all values in a sample will be distributed equally above and below the mean. Thus, about 99.7% of all values falls within three standard deviations of the mean and therefore there is only a 0.3% chance of an extreme event occurring.

Many financial models such as Modern Portfolio Theory, Efficient Markets, and the Black-Scholes option pricing model assume normality. However, real market events in the past have shown us that the unpredictable human behavior makes marketplace less than perfect. This gives rise to extreme events and consequently to the fat tail distribution and the consequent risks.

By definition, a fat tail is a probability distribution which predicts movements of three or more standard deviations more frequently than a normal distribution. Quant analysts doing HFT need to model the tail risks to avoid big losses, and hence tail risk hedging assumes importance in HFT.

The plot shown below illustrates a fat tail distribution vis-à-vis normal a distribution.

** **

High frequency data exhibits volatility clustering and long memory effects in absolute values of returns.

**Volatility Clustering – **In finance, volatility clustering refers to the observation, as noted as Mandelbrot (1963), that “large changes tend to be followed by large changes, of either sign and small changes tend to be followed by small changes.”

**Long-range dependence (Long memory) – **Long-range dependence (LRD), also called long memory or long-range persistence, is a phenomenon that may arise in the analysis of spatial or time series data. It relates to the rate of decay of statistical dependence of two points with increasing time interval or spatial distance between the points. A phenomenon is usually considered to have long-range dependence if the dependence decays more slowly than an exponential decay, typically a power-like decay.

HFT players rely on microsecond/nanosecond latency and have to deal with enormous data. Utilizing big data for HFT comes with its own set of problems. HFT firms need to have the latest state-of-the-art hardware and latest software technology to deal with big data, which otherwise can increase the processing time beyond the acceptable standards.

These were some of the features underlying high frequency data that HFT models need to take into account. If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Algorithmic & Quantitative Trading, Statistics & Econometrics, and Financial Computing & Technology. Enroll now!

In our previous blog we talked about Data Visualization in Python using Bokeh. Now, let’s take our series on Python data visualization forward, and cover another cool data visualization Python package. In this post we will use the Python Seaborn package to create Heatmaps which can be used for various purposes, including by traders for tracking markets.

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Because seaborn is built on top of Matplotlib, the graphics can be further tweaked using Matplotlib tools and rendered with any of the Matplotlib backends to generate publication-quality figures. [1]

Types of plots that can be created using seaborn includes:

- Distribution plots
- Regression plots
- Categorical plots
- Matrix plots
- Timeseries plots

The plotting functions operate on Python dataframes and arrays containing a whole dataset, and internally perform the necessary aggregation and statistical model-fitting to produce informative plots.[2]

** **** Source: seaborn.pydata.org**

A heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors. The seaborn package allows for creation of annotated heatmaps which can be tweaked using Matplotlib tools as per the creator’s requirement.

**Annotated Heatmap**

We will create a seaborn heatmap for a group of 30 Pharmaceutical Company stocks listed on the National Stock Exchange of India Ltd (NSE). The heatmap will display the stock symbols and its respective single-day percentage price change.

We collate the required market data on Pharma stocks and construct a comma-separated values (CSV) file comprising of the stock symbols and their respective percentage price change in the first two columns of the CSV file.

Since we have 30 Pharma companies in our list, we will create a heatmap matrix of 6 rows and 5 columns. Further, we want our heatmap to display the percentage price change for the stocks in a descending order. To that effect we arrange the stocks in a descending order in the CSV file and add two more columns which indicate the position of each stock on X & Y axis of our heatmap.

We import the following Python packages:

We read the dataset using the read_csv function from pandas, and visualize the first ten rows using the print statement.

Since we want to construct a 6 x 5 matrix, we create an n-dimensional array of the same shape for “Symbol” and the “Change” columns.

The pivot function is used to create a new derived table from the given dataframe object “df”. The function takes three arguments; index, columns, and values. The cell values of the new table are taken from column given as the values parameter, which in our case is the “Change” column.

In this step we create an array which will be used to annotate the heatmap. We call the flatten method on the “symbol” and “percentage” arrays to flatten a Python list of lists in one line. The zip function which returns an iterator zips a list in Python. We run a Python For loop and by using the format function; we format the stock symbol and the percentage price change value as per our requirement.

We create an empty Matplotlib plot and define the figure size. We also add the title to the plot and set the title’s font size, and its distance from the plot using set_position method.

We wish to display only the stock symbols and their respective single-day percentage price change. Hence, we hide the ticks for the X & Y axis, and also remove both the axes from the heatmap plot.

In the final step, we create the heatmap using the heatmap function from the Python seaborn package. The heatmap function takes the following arguments:

**data **– 2D dataset that can be coerced into an ndarray. If a Pandas DataFrame is provided, the index/column information will be used to label the columns and rows.

**annot** – an array of same shape as data which is used to annotate the heatmap.

**cmap** – a matplotlib colormap name or object. This maps the data values to the color space.

**fmt** – string formatting code to use when adding annotations.

**linewidths** – sets the width of the lines that will divide each cell.

Here’s our final output of the seaborn heatmap for the chosen group of pharmaceutical companies. Looks pretty neat and clean, doesn’t it? A quick glance at this heatmap and one can easily make out how the market is faring for the period.

**Download the Python Heatmap Code**

Readers can download the entire Python code plus the excel file using the download button provided below and create their own custom heatmaps. A little tweak in the Python code and you can create Python heatmaps of any size, for any market index, or for any period using this Python code. The heatmap can be used in live markets by connecting the real time data feed to the excel file that is read in the Python code.

As illustrated from the heatmap example above, seaborn is easy to use and one can tweak the seaborn plots to one’s requirement. You can refer to the documentation of seaborn for creating other impressive charts that you can put to use for analyzing the markets.

Python Data Visualization is just one of the elements covered in the vast domain of Algorithmic Trading. To understand the patterns, one must be well-versed in the basics. Want to know more about Algorithmic trading? You should click here and check out more about Algorithmic Trading.

Download Python Code:

**Data Visualization using Seaburn.rar**- Pharma Heatmap using Seaburn.py
- Pharma Heatmap.data

Back to Top