This week’s R bulletin will cover topics like how to parse milliseconds in R, how to format dates and method to extract specific parts of date and time.
We will also cover functions like %within%, %m+%, to.period function, period.max and period.min functions. Click To TweetHope you like this R weekly bulletin. Enjoy reading!
1. Indent – hit Tab button at beginning of the line
2. Outdent – Shift+Tab
3. Go to a specific line – Shift+Alt+G
Problem Solving Ideas
How to parse milliseconds in R
When R reads dates from a text or spreadsheet file, it will typically store them as character vector or a factor. To convert them to dates, we need to parse these strings. Parsing can be done using the strptime function, which returns POSIXlt dates.
To parse the dates, you must tell strptime which bits of the string correspond to which bits of the date. The date format is specified using a string, with components specified with a percent symbol followed by a letter. See the example below. These components are combined with other fixed characters, such as colons in times, or dashes and slashes in dates to form a full specification.
strptime("25/06/2016 09:50:24", "%d/%m/%Y %H:%M:%S") “2016-06-25 09:50:24 IST”
If a string does not match the format in the format string, it takes the value NA. For example, specifying dashes instead of slashes makes the parsing fail:
strptime("25-06-2016 09:50:24", "%d/%m/%Y %H:%M:%S") NA
To parse milliseconds in R we use the following format string:
strptime("25-06-2016 09:50:24.975", "%d-%m-%Y %H:%M:%OS") “2016-06-25 09:50:24.975 IST”
op = options(digits.secs=3)
The options function allows the user to set and examine a variety of global options which affect the way in which R computes and displays its results. The argument “digits.secs” to the options function controls the maximum number of digits to print when formatting time values in seconds. Valid values are 0 to 6 with default 0.
How to format dates
We can format a date as per our requirement using the strftime (“string format time”) function. This function works on both POSIXct and POSIXlt date classes, and it turns a date variable into a string.
now_time = Sys.time() # Let us first check the class of the the now_time variable by calling the # class function. class(now_time) “POSIXct” “POSIXt”
As can be seen, it is a “POSIXct” variable. To correctly format a date, we need to use the right naming conventions. For example, %B signifies the full name of the month, %p signifies the AM/PM time indicator, and so on. For the full list of the conventions, check the help page for the function. The required formatting conventions are placed in quotes and form the second argument of the function. An example of the strftime function usage is shown below.
strftime(now_time, "%I:%M%p %A, %d %B %Y") “09:16AM Saturday, 29 April 2017”
Extract specific parts of a date and time
R has two standard date-time classes, POSIXct and POSIXlt. The POSIXct class stores dates as the number of seconds since the start of 1970, while the POSIXlt stores dates as a list, with components for seconds, minutes, hours, day of month, etc. One can use list indexing to access individual components of a POSIXlt date. These components can be viewed using the unclass function.
Example: In this example, we use the Sys.time function, which gives the current date and time. We convert this into POSIXlt class using the as.POSIXlt function. Now we can extract the required components.
time_now = as.POSIXlt(Sys.time()) print(time_now) “2017-04-29 09:16:06 IST”
 “” “IST” “IST”
# To extract minutes time_now$min 16
# To extract day of the month time_now$mday 29
%within% and %m+% functions
%within% function: This function from the lubridate package checks whether a date-time object falls in a given date-time interval. It returns a logical TRUE or FALSE as the output. The function requires a date-time interval, which is created using the interval function. We use the ymd function to convert a non-date-time object into a date-time object.
library(lubridate) dates = interval("2016-03-03", "2016-06-03") d = ymd("2016-04-21") d %within% dates TRUE
%m+% function: This function is used to add or subtract months to/from a given date-time object.
# To add a month to a date-time objects library(lubridate) d = ymd("2016-04-21") d %m+% months(1) “2016-05-21”
# To create a sequence of months from a given date-time object library(lubridate) d = ymd("2016-04-21") d %m+% months(1:3) “2016-05-21” “2016-06-21” “2016-07-21”
# To subtract a year from a given date-time object d = ymd("2016-04-21") d %m+% years(-1) “2015-04-21”
The to.period function is part of the xts package. It converts an OHLC or univariate object to a specified periodicity lower than the given data object.
For example, the function can convert a daily series to a monthly series, or a monthly series to a yearly one, or a one minute series to an hourly series.
library(quantmod) data = getSymbols("AAPL", src = "yahoo", from = "2016-01-01", to = "2016-01-15", auto.assign = FALSE) nrow(data) 10
# Convert the above daily data series to weekly data. to.period(data, period = "weeks")
Valid period character strings include: “seconds”, “minutes”, “hours”, “days”, “weeks”, “months”, “quarters”, and “years”.
To convert the daily data to monthly data, the syntax will be:
df = to.period(data,period = ‘months’)
The result will contain the open and close for the given period, as well as the maximum and minimum over the new period, reflected in the new high and low, respectively. If volume for a period was available, the new volume will also be calculated.
period.max and period.min functions
The period.max and period.min functions are part of the xts package. period.max is used to calculate a maximum value per period given an arbitrary index of sections to be calculated over. The period.min function works in a similar manner to compute the minimum values.
The syntax is given as:
period.max(x, index) period.min(x, index)
x – represents a univariate data object
index – represents a numeric vector of endpoints
library(xts) # compute the period maximum period.max(c(1, 1, 4, 2, 2, 6, 7, 8, -1, 20), c(0, 3, 5, 8, 10))
3 5 8 10
4 2 8 20
# compute the period minimum period.min(c(1, 1, 4, 2, 2, 6, 7, 8, -1, 20), c(0, 3, 5, 8, 10))
3 5 8 10
1 2 6 -1
We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.
We have noticed that some users are facing challenges while downloading the market data from Yahoo and Google Finance platforms. In case you are looking for an alternative source for market data, you can use Quandl for the same.