In this post we discuss sentiment analysis in brief and then present a basic sentiment analysis model in R. Sentiment analysis is the analysis of the feelings (i.e. attitudes, emotions and opinions) which are expressed in the news reports/blog posts/twitter messages etc., using natural language processing tools.
Natural language processing (NLP) in simple terms refers to the use of computers to process sentences/text in a natural language such as English. The objective here is to extract information from unstructured or semi-structured data found in these tweets/blogs/articles. To enable this NLP makes use of artificial intelligence, computational linguistics, and computer science.
Using NLP models hundreds of text documents can be processed to ascertain the sentiment in seconds. These days sentiment analysis is a hot topic, and has found wide application in areas like Business intelligence, Politics, Finance, Policy making etc.
Sentiment analysis in Trading – Sentiments can often drive the direction of the markets. Hence, traders and other participants in the financial markets seek to gauge the sentiment expressed in news reports/tweets/blog posts. Traders build automatic trading systems which extract the sentiment from natural language. These trading systems take long/short positions in the markets based on the trading signals generated. The trading systems can also be combined with other trading systems. The objective at the end of the day is to generate superior returns from the extracted information.
There are various methods and models for sentimental analysis. Let us take a look at a very basic model in R for sentimental analysis.
Sentiment analysis model in R
In this model we implement the “Bag-of-words” approach to sentiment analysis. The process identifies positive and negative words (or a string of words) within an article. For this it makes use of a large dictionary which contains words that carry sentiment. Each word in this dictionary can be assigned a weight. The sum of the positive and negative words is the final sentiment score generated by the model.
We will test our model on the management commentary text taken from the latest earnings call transcript of Eicher Motors Ltd. Eicher Motors is a leading Indian automaker company which owns the Royal Enfield Motors. The objective of our model will be to gauge the opinion expressed in their fourth quarter 2015 earnings call.
To build this model we are using the “tm” and the “Rweka” package in R. We load the libraries and then read the two documents which contain the positive and the negative terms. To prepare these documents we have gone through four previous conference call transcripts prior to the fourth quarter 2015. We picked the positive/negative words from these transcripts to populate our dictionary. In addition to these words we have also added some general positive/negative words that relate to the Motorcycle industry.
We will be considering only the management’s commentary in our sentiment analysis model. We load the text document (fourth quarter 2015) containing the CEO’s prepared text commentary in R using the Corpus function. For this we have stored the commentary document in the TextMining folder in the R’s working directory.
Next step is to clean the text. We convert all words to lowercase, remove punctuations, remove numbers, and strip the whitespace. The writeLines function enables us to see the text post the cleansing.
In the code below, we tokenize the text which was cleaned above. Tokenization is the process of breaking a stream of text into words or a string of words. We are using the NGramTokenizer function here. This creates N-grams of text.
N-grams are basically a set of co-occuring words within a given text. For example, consider this sentence “The food is delicious”. If n= 2, then the ngrams would be:
- the food
- food is
- is delicious
Thereafter we create a term document matrix (called “terms” in the code) which a matrix that lists all occurrences of words in the corpus.
Below we check if the positive/negative words in the dictionary are present in the text document.
Now we extract all the positive/negative words from the text document which matched with the words in our dictionary.
The code lines below compute the positive/negative score, and finally the sentiment score.
Final result – Sentiment score
The model found 14 positive words and 4 negative words, and the final sentiment score was 10. This tells us that the quarterly result for Q4 2015 was good from the management’s perspective. The word cloud below shows some of the positive/negative words that were picked from the text document on which we ran the model.
Validate our sentiment analysis model – let us check the quarterly performance numbers to confirm the positive sentiment score generated by our model. As can be seen, Eicher Motors posted a strong quarter. EBIT growth was around 72% y/y on a strong sales volume of 125,690 motorcycles. The strong results were despite the production shutdown for few days which was caused by the floods experienced during the quarter at its production facility.
The chart on the right shows the stock market’s reaction to Eicher Motors strong results on the day of earnings result announcement. The stock opened at around 17100 levels, made a big move touching an intraday high of around Rs.18500, and finally closed at Rs. 18,175.
This was a basic introduction to sentiment analysis. The model above can be made more robust and fine-tuned further. In future posts we will try to cover other sentiment analysis approaches and attempt to build a model around them.
QuantInsti has been actively participating in conferences on sentiment analysis, and was one of the lead marketing and education partner at the recently held “Sentiment analysis in Finance” conference in Singapore, 2016. Rajib Ranjan Borah, Co-founder & Director of iRageCapital Advisory Pvt. Ltd, & QuantInsti was one of the esteemed panelists for the session, “New Paradigms for Sentiment Analysis Applied to Finance” at the conference.
To know more about QuantInsti and the Executive Programme in Algorithmic Trading (EPAT) course offered by QuantInsti, check our website and the EPAT course page. Feel free to contact our team at firstname.lastname@example.org for queries on EPAT.
Download Data File
- Sentiment analysis in Trading – Files.rar
- Eicher Motors Sentiment Analysis – R Code.txt
- Negative terms.csv
- Positive Terms.csv