R Weekly Bulletin Vol – V

This week’s R bulletin will cover topics like how to avoid for-loops, add or shorten an existing vector, and play a beep sound in R. We will also cover functions like env.new function, readSeries, and the with and within functions. Hope you like this R weekly bulletin. Enjoy reading!

Shortcut Keys

1. To stop debugging – Shift+F8
2. To quit an R session (desktop only) – Ctrl+Q
3. To restart an R Session – Ctrl+Shift+0

Problem Solving Ideas

Avoiding For Loop by using “with” function

For Loop can be slow in terms of execution speed when we are dealing with large data sets. For faster execution, one can use the “with” function as an alternative. The syntax of the with function is given below:

with(data, expr)

where, “data” is typically a data frame, and “expr” stands for one or more expressions to be evaluated using the contents of the data frame. If there is more than one expression, then the expressions need to be wrapped in curly braces.

Example: Consider the NIFTY 1-year price series. Let us find the gap opening for each day using both the methods and time them using the system.time function. Note the time taken to execute the For Loop versus the time to execute the with function in combination with the lagpad function.

library(quantmod)

# Using FOR Loop
system.time({

df = read.csv("NIFTY.csv")
df = df[,c(1,3:6)]

df$GapOpen = double(nrow(df))
for ( i in 2:nrow(df)) {
    df$GapOpen[i] = round(Delt(df$CLOSE[i-1],df$OPEN[i])*100,2)
}

print(head(df))

})

# Using with function + lagpad, instead of FOR Loop
system.time({

dt = read.csv("NIFTY.csv")
dt = dt[,c(1,3:6)]

lagpad = function(x, k) {
c(rep(NA, k), x)[1 : length(x)]
}

dt$PrevClose = lagpad(dt$CLOSE, 1)
dt$GapOpen_ = with(dt, round(Delt(dt$PrevClose,dt$OPEN)*100,2))
print(head(dt))

})

Adding to an existing vector or shortening it

Adding or shortening an existing vector can be done by assigning a new length to the vector. When we shorten a vector, the values at the end will be removed, and when we extend an existing vector, missing values will be added at the end.

Example:

# Shorten an existing vector
even = c(2,4,6,8,10,12)
length(even)
[1] 6

# The new length equals the number of elements required in the vector to be shortened.
length(even) = 3
print(even)
[1] 2 4 6

# Add to an existing vector
odd = c(1,3,5,7,9,11)
length(odd)
[1] 6

# The new length equals the number of elements required in the extended vector.
length(odd) = 8
odd[c(7,8)] = c(13,15)
print(odd)
[1] 1 3 5 7 9 11 13 15

Make R beep/play a sound

If you want R to play a sound/beep upon executing the code, we can do this using the “beepr” package. The beep function from the package plays a sound when the code gets executed. One also needs to install the “audio” package along with the “beepr” package.

install.packages("beepr")
install.packages("audio")
library(beepr)
beep()

One can select from the various sounds using the “sound” argument and by assigning one of the specified values to it.

beep(sound = 9)

One can keep repeating the message using beepr as illustrated in the example below (source:http: //stackoverflow.com/)

Example:

work_complete <- function() {
  cat("Work complete. Press esc to sound the fanfare!!!\n")
  on.exit(beepr::beep(3))

  while (TRUE) {
  beepr::beep(4)
  Sys.sleep(1)
  }
}

work_complete()

One can also use the beep function to play a sound if an error occurs during the code execution.

options(error = function() {beep(sound =5)})

Functions Demystified

env.new function

Environments act as a storehouse. When we create variables in R from the command prompt these get stored in the R’s global environment. To access the variables stored in the global environment, one can use the following expression:

head(ls(envir = globalenv()), 15)
[1] “df”  “dt”  “even”  “i”  “lagpad”  “odd”

If we want to store the variables in a specific environment, we can assign the variable to that environment or create a new environment which will store the variable. To create a new environment we use the new.env function.

Example:

my_environment = new.env()

Once we create a new environment, assigning a variable to the environment can be done in multiple ways. Following are some of the methods:

Examples:

# By using double square brackets
my_environment[["AutoCompanies"]] = c("MARUTI", "TVSMOTOR", "TATAMOTORS")

# By using dollar sign operator
my_environment$AutoCompanies = c("MARUTI", "TVSMOTOR", "TATAMOTORS")

# By using the assign function
assign("AutoCompanies", c("MARUTI", "TVSMOTOR", "TATAMOTORS"), my_environment)

The variables existing in an environment can be viewed or listed using the get function or by using the ls function.

Example:

ls(envir = my_environment)
[1] “AutoCompanies”

get("AutoCompanies", my_environment)
[1] “MARUTI”  “TVSMOTOR”  “TATAMOTORS”

readSeries function

The readSeries function is part of the timeSeries package, and it reads a file in table format and creates a timeSeries object from it. The main arguments of the function are:

readSeries(file, header = TRUE, sep = “,”,format)

where,
file: the filename of a spreadsheet dataset from which to import the data records.
header: a logical value indicating whether the file contains the names of the variables as its first line.
format: a character string with the format in POSIX notation specifying the timestamp format.
sep: the field separator used in the spreadsheet file to separate columns. By default, it is set as “;”.

Example:

library(timeSeries)

# Reading the NIFTY data using read.csv
df = read.csv(file = "NIFTY.csv")
print(head(df))

# Reading the NIFTY data and creating a time series object using readSeries
# function
df = readSeries(file = "NIFTY.csv", header = T, sep = ",", format = "%Y%m%d")
print(head(df))

with and within functions

The with and within functions apply an expression to a given data set and allows one to manipulate it. The within function even keeps track of changes made, including adding or deleting elements and returns a new object with these revised contents. The syntax for these two functions is given as:

with(data, expr)
within(data, expr)

where,
data – typically is a list or data frame, although other options exist for with.
expr – one or more expressions to evaluate using the contents of data, the commands must be wrapped in braces if there is more than one expression to evaluate.

# Consider the NIFTY data
df = as.data.frame(read.csv("NIFTY.csv"))
print(head(df, 3))

# Example of with function:
df$Average = with(df, apply(df[3:6], 1, mean))
print(head(df, 3))

# Example of within function:
df = within(df, {
   df$Average = apply(df[3:6], 1, mean)
})
print(head(df, 3))

Learn Algorithmic trading from Experienced Market Practitioners




  • This field is for validation purposes and should be left unchanged.

Next Step

We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.

2 thoughts on “R Weekly Bulletin Vol – V

  1. April 22, 2017

    Mark Schutt Reply

    There is a slight problem with the with() example. The code as presented is:

    dt$GapOpen_ = with(dt, round(Delt(dt$PrevClose,dt$OPEN)*100,2))

    However the point of the with command is to not require naming the data.frame. The code you have is functionally equivalent to

    dt$GapOpen_ = round(Delt(dt$PrevClose,dt$OPEN)*100,2)

    In this case the second approach is actually slightly faster, less user time, than when using with. with() is a convenience function which takes advantage of Non Standard Evaluation in R, well described here by Hadley Wickham (http://adv-r.had.co.nz/)

    Here are my test results. Note I downloaded the data set I used from https://www.nseindia.com/products/content/equities/indices/historical_index_data.htm. It contained 4/1/2017 – 4/22/2017 data.

    > library(quantmod)
    > library(dplyr)
    >
    > system.time({
    + read.csv(‘NIFTY.csv’) %>%
    + select(Date, Close, High, Low, Open) -> df
    +
    +
    + lagpad = function(x, k) {
    + c(rep(NA, k), x)[1 : length(x)]
    + }
    +
    + df$PrevClose = lagpad(df$Close, 1)
    +
    + df$GapOpen = round(Delt(df$PrevClose, df$Open)*100,2)
    +
    + print(head(df))
    + }) -> r1
    Date Close High Low Open PrevClose Delt.0.arithmetic
    1 03-Apr-2017 9237.85 9245.35 9192.40 9220.60 NA NA
    2 05-Apr-2017 9265.15 9273.90 9215.40 9264.40 9237.85 0.29
    3 06-Apr-2017 9261.95 9267.95 9218.85 9245.80 9265.15 -0.21
    4 07-Apr-2017 9198.30 9250.50 9188.10 9223.70 9261.95 -0.41
    5 10-Apr-2017 9181.45 9225.65 9174.85 9225.60 9198.30 0.30
    6 11-Apr-2017 9237.00 9242.70 9172.85 9184.55 9181.45 0.03
    > print(r1)
    user system elapsed
    0.008 0.000 0.009
    >
    > system.time({
    + read.csv(‘NIFTY.csv’) %>%
    + select(Date, Close, High, Low, Open) -> df
    +
    +
    + lagpad = function(x, k) {
    + c(rep(NA, k), x)[1 : length(x)]
    + }
    +
    + df$PrevClose = lagpad(df$Close, 1)
    +
    + df$GapOpen = with(df, round(Delt(PrevClose, Open)*100,2))
    +
    + print(head(df))
    + }) -> r2
    Date Close High Low Open PrevClose Delt.0.arithmetic
    1 03-Apr-2017 9237.85 9245.35 9192.40 9220.60 NA NA
    2 05-Apr-2017 9265.15 9273.90 9215.40 9264.40 9237.85 0.29
    3 06-Apr-2017 9261.95 9267.95 9218.85 9245.80 9265.15 -0.21
    4 07-Apr-2017 9198.30 9250.50 9188.10 9223.70 9261.95 -0.41
    5 10-Apr-2017 9181.45 9225.65 9174.85 9225.60 9198.30 0.30
    6 11-Apr-2017 9237.00 9242.70 9172.85 9184.55 9181.45 0.03
    > print(r2)
    user system elapsed
    0.008 0.000 0.008
    >
    > system.time({
    + read.csv(‘NIFTY.csv’) %>%
    + select(Date, Close, High, Low, Open) -> df
    +
    +
    + lagpad = function(x, k) {
    + c(rep(NA, k), x)[1 : length(x)]
    + }
    +
    + df$PrevClose = lagpad(df$Close, 1)
    +
    + df$GapOpen = with(df, round(Delt(df$PrevClose, df$Open)*100,2))
    +
    + print(head(df))
    + }) -> r3
    Date Close High Low Open PrevClose Delt.0.arithmetic
    1 03-Apr-2017 9237.85 9245.35 9192.40 9220.60 NA NA
    2 05-Apr-2017 9265.15 9273.90 9215.40 9264.40 9237.85 0.29
    3 06-Apr-2017 9261.95 9267.95 9218.85 9245.80 9265.15 -0.21
    4 07-Apr-2017 9198.30 9250.50 9188.10 9223.70 9261.95 -0.41
    5 10-Apr-2017 9181.45 9225.65 9174.85 9225.60 9198.30 0.30
    6 11-Apr-2017 9237.00 9242.70 9172.85 9184.55 9181.45 0.03
    > print(r3)
    user system elapsed
    0.012 0.000 0.009

  2. April 22, 2017

    Mark Schutt Reply

    I wanted to follow up on why the faster result is faster. R likes vectors and is optimized to use vectors and vector math. The Delta function is effectively doing the calculation: previous close – current close / previous close to get the % difference. Each of these values are vectors so R is optimized to do this math i.e. subtract the vector with previous close from the vector with current close then divide the resulting vector by the vector with the previous close. This is fast, like really fast in R. Much faster than a loop over each record in the data.frame or a list.

    So, when doing basic math operations on vectors, let R do the heavy lifting, it’s optimized to so. It’s also a lot less typing.

Leave a Reply

Your email address will not be published. Required fields are marked *