Converting tick-by-tick data to OHLC data using pandas

Using Python to convert tick by tick data to end of the day data

Using Python to convert tick by tick data to OHLC data : A Pandas Tutorial

by Abhishek Kulkarni

In this post, we will explore a feature of Python pandas package. We usually find queries about converting tick-by-tick data into OHLC (Open, High, Low and Close) frequently. This can be accomplished with minimal effort using pandas package. The OHLC data is used for performing technical analysis of price movement over a unit of time (1 day, 1 hour etc.). We have already seen How OHLC data is used to calculate pivot points which traders use to identify key areas where reversal of price movement is possible, using which they can ideate their investment strategy.

untitled-1

The first step

The first step involves fetching sample data. Let us download sample tick by tick data. For this tutorial, we will use the January data for AUD/JPY (Australian Dollar/Japanese Yen) pair that was downloaded from Pepperstone.

Pepperstone that provides free historical tick data for various currency pairs. The .csv file contains top of the book, tick-by-tick market data, with fractional pip spreads in millisecond details. This data is more than sufficient for our analysis.

The data that we downloaded will look like this:

converting tick by tick data to ohlc

 

Steps on Python

As you can see the data is without any header. We will include the header and accomplish the required task programmatically.

Import pandas package

 import pandas as pd

Load the data

Data is stored in my working directory with a name ‘AUDJPY-2016-01.csv’. As we saw earlier, the data is without a header. Hence we would add header to the data while importing it. Thus importing and adding header take place in the same line of code.

 data_frame = pd.read_csv('AUDJPY-2016-01.csv', names=['Symbol', 'Date_Time', 'Bid', 'Ask'],

 index_col=1, parse_dates=True)
data_frame.head()

This is how the data frame looks like:-

converting tick by tick data to ohlc

We use the resample attribute of pandas data frame. The resample attribute allows to resample a regular time-series data. We shall resample the data every 15 minutes and divide it into OHLC format. If you want to resample for smaller time frames (milliseconds/microseconds/seconds), use L for milliseconds, U for microseconds, and S for seconds.

data_ask =  data_frame['Ask'].resample('15Min').ohlc()

data_bid =  data_frame['Bid'].resample('15Min').ohlc()

A snapshot of tick-by-tick data converted into OHLC format can be viewed with the following commands:-

data_ask.head()

data_bid.head()

converting tick by tick data to ohlc

converting tick by tick data to ohlc pandas tutorial

You may concatenate ask price and bid price to have a combined data frame

 data_ask_bid=pd.concat([data_ask, data_bid], axis=1, keys=['Ask', 'Bid'])

 

The Complete Code

 import pandas as pd

data_frame = pd.read_csv('AUDJPY-2016-01.csv', names=['Symbol', 'Date_Time', 'Bid', 'Ask'],

 index_col=1, parse_dates=True)

data_frame.head()

data_ask =  data_frame['Ask'].resample('15Min').ohlc()

data_bid =  data_frame['Bid'].resample('15Min').ohlc()

data_ask.head()

data_bid.head()

data_ask_bid=pd.concat([data_ask, data_bid], axis=1, keys=['Ask', 'Bid'])

epat

Conclusion

This was a quick way of computing the OHLC using TBT data. This can be applied across assets and one can devise different strategies based on the OHLC data. We can also plot charts based on OHLC, and generate trade signals. Some other ways in which the data can be used is to build technical indicators in python or to compute risk-adjusted returns.

Next Step

Since you will need to be analytical & quantitative while getting into or upgrading to algorithmic trading it is imperative to learn programming (some if not all) and build foolproof systems and execute right strategies. To enable a fine learning experience QuantInsti offers a comprehensive course called Executive Programme in Algorithmic Trading (EPAT) with lecture recordings and life time access and support.

4 thoughts on “Converting tick-by-tick data to OHLC data using pandas

  1. November 28, 2016

    WackyTacky Reply

    In order to do back testing , I can do it on many platforms where this kind of data is readily available.
    But for Live Trading, if strategy demands for some interval viz- 5, 10, 15,minutes data, is it compulsory to convert TBT data to OHLC or is there any other way to do away with without the conversion?

    • November 29, 2016

      K Harikumar Reply

      If the strategy requires candlesticks data then converting into OHLC is necessary.
      Some brokers/data vendors do provide 5x candlestick data otherwise it is necessary to convert into OHLC format.

      • December 1, 2016

        WackyTacky Reply

        How do I convert streaming data to OHLC ?
        Does the above method help me? [The above method is for the data of past and not streaming]

        Secondly the tick data is not in proper format. for instance I get a dictionary of key and values in one single cell of CSV that too without datetime
        I have attached a file as to show the sample of how I receive data

        Can you help me on this?

    • December 6, 2016

      admin Reply

      If the strategy requires candlesticks data then converting into OHLC is necessary.
      Some brokers/data vendors do provide 5x candlestick data otherwise, it is necessary to convert into OHLC format.

Leave a Reply

Your email address will not be published. Required fields are marked *