R Best Practices: R you writing the R way!

By Milind Paradkar

Any programmer inevitably writes tons of codes in his daily work. However, not all programmers inculcate the habit of writing clean codes which can be easily be understood by others. One of the reasons can be the lack of awareness among programmers of the best practices followed in writing a program. This is especially the case for novice programmers. In this post, we list some of the R programming best practices which will lead to improved code readability, consistency, and repeatability. Read on!

Best practices of writing in R

1) Describe your code – When you start coding describe what the R code does in the very first line. For subsequent blocks of codes follow the same method of describing the blocks. This makes it easy for other people to understand and use the code.


# This code captures the 52-week high effect in stocks
# Code developed by Milind Paradkar

2) Load Packages – After describing your code in the first line, use the library function to list and load all the relevant packages needed to execute your code.


library(quantmod);  library(zoo); library(xts);
library(PerformanceAnalytics); library(timeSeries); library(lubridate);

3) Use Updated Packages – While writing your code ensure that you are using the latest updated R packages. To check the version of any R package you can use the packageVersion function.


[1] ‘0.23.1’

4) Organize all source files in the same directory – Store all the necessary files that will be used/sourced in your code in the same directory. You can use the respective relative path to access them.


# Reading file using relative path
df = read.csv(file = "NIFTY.csv", header = TRUE)

# Reading file using full path
df =  read.csv(file = "C:/Users/Documents/NIFTY.csv", header = TRUE)

5) Use a consistent style for data structure types – R programming language allows for different data structures like vectors, factors, data frames, matrices, and lists. Use a similar naming for a particular type of data structure. This will make it easy to recognize the similar data structures used in the code and to spot the problems easily.

You can name all different data frames used in your code by adding .df as the suffix.

aapl.df   = as.data.frame(read.csv(file = "AAPL.csv", header = TRUE))
amzn.df = as.data.frame(read.csv(file = "AMZN.csv", header = TRUE))
csco.df  = as.data.frame(read.csv(file = "CSCO.csv", header = TRUE))

6) Indent your code – Indentation makes your code easier to read, especially, if there are multiple nested statements like For-loop and If statement.


# Computing the Profit & Loss (PL) and the Equity
dt$PL = numeric(nrow(dt))
for (i in 1:nrow(dt)){
   if (dt$Signal[i] == 1) {dt$PL[i+1] = dt$Close[i+1] - dt$Close[i]}
   if (dt$Signal[i] == -1){dt$PL[i+1] = dt$Close[i] - dt$Close[i+1]}

7) Remove temporary objects – For long codes, running in thousands of lines, it is a good practice to remove temporary objects after they have served their purpose in the code. This can ensure that R does not into memory issues.

8) Time the code – You can time your code using the system.time function. You can also use the same function to find out the time taken by different blocks of code. The function returns the amount of time taken in seconds to evaluate the expression or a block of code. Timing codes will help to figure out any bottlenecks and help speed up your codes by making the necessary changes in the script.

To find the time taken for different blocks we wrapped them in curly braces within the call to the system.time function.

The two important metrics returned by the function include:
i) User time – time charged to the CPU(s) for the code
ii) Elapsed time – the amount of time elapsed to execute the code in entirety


# Generating random numbers
mean_1 = rnorm(1e+06, mean = 0, sd = 0.8)

user    system    elapsed
0.40      0.00       0.45

9) Use vectorization – Vectorization results in faster execution of codes, especially when we are dealing with large data sets. One can use statements like the ifelse statement or the with function for vectorization.

Consider the NIFTY 1-year price series. Let us find the gap opening for each day using both the methods (using for-loop and with function) and time them using the system.time function. Note the time taken to execute the for-loop versus the time to execute the with function in combination with the lagpad function.

# Using FOR Loop
df = read.csv("NIFTY.csv")
df = df[,c(1,3:6)]
df$GapOpen = double(nrow(df))
for ( i in 2:nrow(df)) {
df$GapOpen[i] = round(Delt(df$CLOSE[i-1],df$OPEN[i])*100,2)

# Using with function + lagpad, instead of FOR Loop
df = read.csv("NIFTY.csv")
df = dt[,c(1,3:6)]

lagpad = function(x, k) {
c(rep(NA, k), x)[1 : length(x)]

df$PrevClose = lagpad(df$CLOSE, 1)
df$GapOpen_ = with(df, round(Delt(df$PrevClose,df$OPEN)*100,2))

10) Folding codes – Folding codes is a way wherein the R programmer can fold a code of line or code sections. This allows hiding blocks of code whenever required, and makes it easier to navigate through lengthy codes. Code folding can be done in two ways:
i) Automatic folding of codes
ii) User-defined folding of codes

Automatic folding of codes: RStudio automatically provides the flexibility to fold the codes. When a coder writes a function or conditional blocks, RStudio automatically creates foldable codes.

User-defined folding of codes: 
One can also fold a random group of codes by using Edit -> Folding -> Collapse or by simply selecting the group of codes and pressing Alt+L key.

User-defined folding can also be done via Code Sections:
To insert a new code section you can use the Code -> Insert Section command. Alternatively, any comment line which includes at least four trailing dashes (-), equal signs (=) or pound signs (#) automatically creates a code section.

11) Review and test your code rigorously – Once your code is ready, ensure that you test it code rigorously on different input parameters. Ensure that the logic used in statements like for-loop, if statement, ifelse statement is correct. It is a nice idea to get your code reviewed by your colleague to ensure that the work is of high quality.

12) Don’t save your workspace When you want to exit R it checks if you want to save your workspace. It is advisable to not save the workspace and start in a clean workspace for your next R session. Objects from the previous R sessions can lead to errors which can be hard to debug.

These were some of the best practices of writing in R that one can follow to make your codes easy to read, debug and to ensure consistency.

 Next Step

 If you want to learn various aspects of Algorithmic trading then check out the Executive Programme in Algorithmic Trading (EPAT™). The course covers training modules like Statistics & Econometrics, Financial Computing & Technology, and Algorithmic & Quantitative Trading. EPAT™ equips you with the required skill sets to be a successful trader. Enroll now!

3 thoughts on “R Best Practices: R you writing the R way!

  1. April 14, 2017

    Mike Anderson Reply

    A good habit would be to eliminate the use of the word “codes” when describing a programming script, e.g. “writing clean codes”. Unless of course, you’re writing the same program in several different languages, simultaneously. Otherwise, it jars, and gives the impression that your codes-ing experience is a bit thin.

    My meaningful naming conventions may differ, and I do cling to the <- assignment operator, but otherwise, spot on! Hadn't caught that trailing —- trick for code folding; thanks for the tip.

    • April 15, 2017

      admin Reply

      Thanks, Mike. Nice practice that programmers should remember about eliminating the use of word “codes” in describing a script.

  2. April 17, 2017

    Michael Theobald Reply

    I did not see two related concepts in your write-up, and I use them frequently. These are 1) organize your script into functions and 2) strive to create reproducible lines of instructions that can be passed from one script to another. For what it is worth, I tend to like the command line more than I like IDEs, because I can use operating system facilities for cut and past. On Mac OS X or Linux use cat -n to display your source file with temporary line numbers, use the ‘/’ operator to find things, perhaps by using function names as search keys. Once the start and end line numbers have been found use head and tail to isolate the code with a pipe to ‘less’ if needed. Transfer the lines from one script to another by using >>, that’s two greater than signs, to append the lines to an existing file, but not replace it. Use your favorite text editor to place the added lines into the target source file where they are needed.

Leave a Reply

Your email address will not be published. Required fields are marked *