Introduction To Machine Learning K-Nearest Neighbors (KNN) Algorithm In Python

Introduction to Machine learning K-Nearest Neighbors (KNN) Algorithm in Python

By Vibhu Singh

Machine Learning is one of the most popular approaches in Artificial Intelligence. Over the past decade, Machine Learning has become one of the integral parts of our life. It is implemented in a task as simple as recognizing human handwriting or as complex as self-driving cars. It is also expected that in a couple of decades, the more mechanical repetitive task will be over. With the increasing amounts of data becoming available there is a good reason to believe that Machine Learning will become even more prevalent as a necessary element for technological progress. There are many key industries where ML is making a huge impact: Financial services, Delivery, Marketing and Sales, Health Care to name a few. However, here we will discuss the implementation and usage of Machine Learning in trading.

In this blog, we will give you an overview of the K-Nearest Neighbors (KNN) algorithm and understand the step by step implementation of trading strategy using K-Nearest Neighbors in Python.

K-Nearest Neighbors (KNN) is one of the simplest algorithms used in Machine Learning. KNN algorithms use a data and classify new data points based on a similarity measures (e.g. distance function). Classification is done by a majority vote to its neighbors. The data is assigned to the class which has the most nearest neighbors. As you increase the number of nearest neighbors, the value of k, accuracy might increase.

Introduction To Machine Learning K-Nearest Neighbors (KNN) Algorithm In PythonClick To Tweet

Now, let us understand the implementation of K-Nearest Neighbors in Python in creating a trading strategy.

1. Import the Libraries

We will start by importing the necessary libraries. We will import the pandas libraries to use the features of its powerful dataframe. We will import the numpy libraries for scientific calculation. Next, we will import the matplotlib.pyplot library for plotting the graph. We will import two machine learning libraries KNeighborsClassifier from sklearn.neighbors to implement the k-nearest neighbors vote and accuracy_score from sklearn.metrics for accuracy classification score. We will also import fix_yahoo_finance package to fetch data from Yahoo.

Import the libraries2. Fetch the Data

We will fetch the S&P 500 data from yahoo finance using ‘pandas_datareader’. We store this in a data frame ‘df’. After this, we will drop all the missing values from the data using ‘dropna’ function and print the first five rows of column ‘Open’, ‘High’, ‘Low’, ‘Close’.

Fetch the data

Fetch the data - Output

3. Define Predictor Variable

Predictor variable, also known as an independent variable is used to determine the value of the target variable. We use ‘Open-Close’ and ‘High-Low’ as a predictor variable. We will drop the NaN values and store the predictor variables in ‘X’.

Define Predictor Variable

Define Predictor Variable - Output

Learn Algorithmic trading from Experienced Market Practitioners

  • This field is for validation purposes and should be left unchanged.

4. Define Target Variables

The target variable, also known as the dependent variable is the variable whose values are to be predicted by predictor variables. In this, the target variable is whether S&P 500 price will close up or down on the next trading day. The logic is that if the tomorrow’s closing price is greater than today’s closing price, then we will buy the S&P 500, else we will sell the S&P 500. We will store +1 for the buy signal and -1 for the sell signal. We will store the target variable in a variable ’Y’.

Define target Variables

5. Split the Dataset

Now, we will split the dataset into training dataset and test dataset. We will use 70% of our data to train and the rest 20% to test. To do this, we will create a split parameter which will divide the dataframe in a 70-30 ratio. You can change the split percentage as per choice, but it is advisable to give at least 60% data as train data for good results.

‘X_train’ and ‘Y_train’ are train dataset. ‘X_test’ and ‘Y_test’ are test dataset.

Split the dataset

6. Instantiate KNN Model

After splitting the dataset into training and test dataset, we will instantiate k-nearest classifier. Here we are using ‘k =15’, you may vary the value of k and notice the change in result. Next, we fit the train data by using ‘fit’ function. Then, we will calculate the train and test accuracy by using ‘accuaracy_score’ function.

Instantiate KNN model

Instantiate KNN model - Output

Here, we see that an accuracy of 50% in a test dataset which means that 50% of the time our prediction will be correct.

7. Create trading strategy using the model

Our trading strategy is simply to buy or sell. We will predict the signal to buy or sell using ‘predict’ function. Then, we will calculate the cumulative S&P 500 returns for test dataset. Next, we will calculate the cumulative strategy return based on the signal predicted by the model in the test dataset. Next, we will plot the cumulative S&P 500 returns and cumulative strategy returns and visualize the performance.

Create trading strategy using the model

Create trading strategy using the model - OutputThis is clear from the graph that cumulative S&P 500 returns from 01-Jan-2012 to 01-Jan-2017 are around 10% and cumulative strategy returns in the same period are around 25%.

Learn Algorithmic trading from Experienced Market Practitioners

  • This field is for validation purposes and should be left unchanged.

8. Sharpe Ratio

The Sharpe ratio is the return earned in excess of the market return per unit of volatility. First, we will calculate the standard deviation of the cumulative returns, and use it further to calculate the Sharpe ratio.
Sharpe ratioOutput:

Sharpe ratio - Output

The Sharpe ratio of our strategy is 0.78.

Now, it is your turn!

You can tweak the code in the following ways.

  1. You can use and try the model on the different dataset.
  2. You can create your own predictor variable using different indicators that could improve the accuracy of the model.
  3. You can change the value of K and play around with it.
  4. You can change the trading strategy as you wish.

Next Step

If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to be a successful trader. Enroll now!

Or you can sign up for our short course series on Machine Learning for Trading on Quantra. The 3-course bundle ‘Trading With Machine Learning’ covers Regression, Classification and SVM concepts along with their practical implementation in trading strategy with the help of sample strategy and ample exercises. The bundle offers a 30% discount, click here to know more.

2 thoughts on “Introduction To Machine Learning K-Nearest Neighbors (KNN) Algorithm In Python

  1. January 30, 2018

    jj edwards Reply

    isn’t yahoo data feed shut down?

    • January 30, 2018

      admin Reply

      Thanks for your comment!

      Yahoo! finance has decommissioned their historical data API, causing many programs that relied on it to stop working.
      fix-yahoo-finance offers a temporary fix to the problem by scraping the data from Yahoo! finance using and return a Pandas DataFrame/Panel in the same format as pandas_datareader’s get_data_yahoo().
      By basically “hijacking” method, fix-yahoo-finance’s implantation is easy and only requires to importfix_yahoo_finance into your code.

Leave a Reply

Your email address will not be published. Required fields are marked *