Gold Price Prediction Using Machine Learning In Python

Machine Learning For Trading – Gold Price Prediction Using Regression In Python

By Ishan Shah

Is it possible to predict where the Gold price is headed?

Yes, let’s use machine learning regression techniques to predict the price of one of the most important precious metal, the Gold.

We will create a machine learning linear regression model that takes information from the past Gold ETF (GLD) prices and returns a prediction of the Gold ETF price the next day.

GLD is the largest ETF to invest directly in physical gold. (source: http://www.etf.com/GLD)

Steps to predict gold prices using machine learning in python

  1. Import the libraries and read the Gold ETF data
  2. Define explanatory variables
  3. Define dependent variable
  4. Split the data into train and test dataset
  5. Create a linear regression model
  6. Predict the Gold ETF prices

Gold Price Prediction Using Regression In PythonClick To Tweet

Import the libraries and read the Gold ETF data

First things first: import all the necessary libraries which are required to implement this strategy.

# LinearRegression is a machine learning library for linear regression 

from sklearn.linear_model import LinearRegression 

# pandas and numpy are used for data manipulation 

import pandas as pd 

import numpy as np 

# matplotlib and seaborn are used for plotting graphs 

import matplotlib.pyplot as plt 

import seaborn 

# fix_yahoo_finance is used to fetch data 

import fix_yahoo_finance as yf

Then, we read the past 10 years of daily Gold ETF price data and store it in Df. We remove the columns which are not relevant and drop NaN values using dropna() function. Then, we plot the Gold ETF close price.

# Read data 

Df = yf.download('GLD','2008-01-01','2017-12-31')

# Only keep close columns 

Df=Df[['Close']] 

# Drop rows with missing values 

Df= Df.dropna() 

# Plot the closing price of GLD 

Df.Close.plot(figsize=(10,5)) 

plt.ylabel("Gold ETF Prices")

plt.show()

Output:

plotting the Gold ETF close price

Define explanatory variables

An explanatory variable is a variable that is manipulated to determine the value of the Gold ETF price the next day. Simply, they are the features which we want to use to predict the Gold ETF price. The explanatory variables in this strategy are the moving averages for past 3 days and 9 days. We drop the NaN values using dropna() function and store the feature variables in X.

However, you can add more variables to X which you think are useful to predict the prices of the Gold ETF. These variables can be technical indicators, the price of another ETF such as Gold miners ETF (GDX) or Oil ETF (USO), or US economic data.

Df['S_3'] = Df['Close'].shift(1).rolling(window=3).mean() 

Df['S_9']= Df['Close'].shift(1).rolling(window=9).mean() 

Df= Df.dropna() 

X = Df[['S_3','S_9']] 

X.head()

Output:

adding more variables

Learn Algorithmic trading from Experienced Market Practitioners




  • This field is for validation purposes and should be left unchanged.

Define dependent variable

Similarly, the dependent variable depends on the values of the explanatory variables. Simply put, it is the Gold ETF price which we are trying to predict. We store the Gold ETF price in y.

y = Df['Close']

y.head()

Output:

Date

2008-02-08    91.000000

2008-02-11    91.330002

2008-02-12    89.330002

2008-02-13    89.440002

2008-02-14    89.709999

Name: Close, dtype: float64

Split the data into train and test dataset

In this step, we split the predictors and output data into train and test data. The training data is used to create the linear regression model, by pairing the input with expected output. The test data is used to estimate how well the model has been trained.

Historical gold ETF

  1. First 80% of the data is used for training and remaining data for testing
  2. X_train & y_train are training dataset
  3. X_test & y_test are test dataset
t=.8 

t = int(t*len(Df)) 

# Train dataset 

X_train = X[:t] 

y_train = y[:t]  

# Test dataset 

X_test = X[t:] 

y_test = y[t:]

Create a linear regression model

We will now create a linear regression model. But, what is linear regression?

If we try to capture a mathematical relationship between ‘x’ and ‘y’ variables that “best” explains the observed values of ‘y’ in terms of observed values of ‘x’ by fitting a line through a scatter plots then such an equation between x and y is called linear regression analysis.

dependent and independent variable

To break it down further, regression explains the variation in a dependent variable in terms of independent variables. The dependent variable – ‘y’ is the variable that you want to predict. The independent variables – ‘x’ are the explanatory variables that you use to predict the dependent variable.  The following regression equation describes that relation:

Y = m1 * X1 + m2 * X2 + C

Gold ETF price = m1 * 3 days moving average + m2 * 15 days moving average + c

Then we use the fit method to fit the independent and dependent variables (x’s and y’s) to generate coefficient and constant for regression.

linear = LinearRegression().fit(X_train,y_train) 

print "Gold ETF Price =", round(linear.coef_[0],2), \ 

"* 3 Days Moving Average", round(linear.coef_[1],2), \ 

"* 9 Days Moving Average +", round(linear.intercept_,2)

Output:

Gold ETF Price = 1.2 * 3 Days Moving Average – 0.2 * 9 Days Moving Average + 0.39

Learn Algorithmic trading from Experienced Market Practitioners




  • This field is for validation purposes and should be left unchanged.

Predicting the Gold ETF prices

Now, it’s time to check if the model works in the test dataset. We predict the Gold ETF prices using the linear model created using the train dataset. The predict method finds the Gold ETF price (y) for the given explanatory variable X.

predicted_price = linear.predict(X_test)  

predicted_price = pd.DataFrame(predicted_price,index=y_test.index,columns = ['price'])  

predicted_price.plot(figsize=(10,5))  

y_test.plot()  

plt.legend(['predicted_price','actual_price'])  

plt.ylabel("Gold ETF Price")  

plt.show()

Output:

Gold ETF price

The graph shows the predicted and actual price of the Gold ETF.

Now, let’s compute the goodness of the fit using the score() function.

r2_score = linear.score(X[t:],y[t:])*100  

float("{0:.2f}".format(r2_score))

Output:

95.81%

As it can be seen, the R-squared of the model is 95.81%. R-squared is always between 0 and 100%. A score close to 100% indicates that the model explains the Gold ETF prices well.

Congrats! You just learned a fundamental yet strong machine learning technique. Thanks for reading!

Next Step

Are you keen to learn various aspects of Algorithmic trading to enhance your existing skill set or to start trading on your own? Check out the Executive Programme in Algorithmic Trading (EPAT). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to be a successful trader. Enroll now to begin your career in Algorithmic Trading.

Or you can sign up for our short course series on Machine Learning for Trading on Quantra. The 3-course bundle ‘Trading With Machine Learning’ covers Regression, Classification and SVM concepts along with their practical implementation in trading strategy with the help of sample strategy and ample exercises. The bundle offers a 30% discount, click here to know more.

One thought on “Gold Price Prediction Using Machine Learning In Python

  1. January 25, 2018

    Geo Reply

    “GLD is the largest ETF to invest directly in physical gold.”

    Note that this is questionable at best. Paper gold GLD claims to be fully backed by physical gold bullion but yet it refuses to give retail investors the right to redeem for any of these ‘claimed’ gold bullion. This fact alone would mean GLD shares are nothing more than paper at the end of the day. Furthermore, GLD’s prospectus is chalk full of weasel clauses and legal loopholes that allows the fund to get away without the full physical gold backing. One good example of this is the clause that states GLD has no right to audit subcustodial gold holdings. To this day, I have not heard of a single good reason for the existence of this audit loophole. I’ve also verified the following to be true and welcome everyone else to do so:

    “Did anyone try calling the GLD hotline at (866) 320 4053 in search of numerical details on GLD’s insurance? The prospectus vaguely states “The Custodian maintains insurance with regard to its business on such terms and conditions as it considers appropriate which does not cover the full amount of gold held in custody.” When I asked about how much of the gold was insured, the representative proceeded to act as if he didn’t know and said they were just the “marketing agent” for GLD. What kind of marketing agent would not know such basic information about a product they are marketing? It seems like they are deliberately hiding information from investors.”

    “I remember there was a well documented visit by CNBC’s Bob Pisani to GLD’s gold vault. This visit was organized by GLD’s management to prove the existence of GLD’s gold but the gold bar held up by Mr. Pisani had the serial number ZJ6752 which did not appear on the most recent bar list at that time. It was later discovered that this “GLD” bar was actually owned by ETF Securities.”

Leave a Reply

Your email address will not be published. Required fields are marked *