A natural starting point for this site is my master’s thesis, in which I explored sentiment and linguistic hedging in financial documents.
For now, we’ll focus on sentiment analysis, sometimes referred to as opinion mining, which we define as the study of the emotion found in text. Having recently become a bit of a buzz word in business intelligence and machine learning, sentiment has been mined from corpora ranging from movie reviews to Twitter feeds. My work in this area focused on the field of finance, namely through annual reports filed with the Securities and Exchange Commission (SEC), 10-Ks. These documents vary widely in length and style, but they all share some common content related to the discussion of firm performance. For the curious, a list of the required items can be found here.
Several popular sentiment dictionaries are available for general analysis, but the one my work is based on was developed specifically for finance. Loughran & McDonald (L&M) built six sentiment lists from the Harvard Psychosociological Dictionary, including lists for positive and negative sentiment, which consist of 354 and 2329 words, respectively. Though extensive, these lists fail to capture one important aspect of sentiment-rich language: it is non-binary. For example, the words bad and catastrophic are both negative terms, but the latter certainly carries much more negativity. To capture the gradient nature of this topic, I employed the annotations of over 300 undergraduate business and pre-business students. The result is a map of 1317 unique words to a “sentiment score” ranging from -2 to +2 and based on the average score given by the student annotators and how well the annotators agreed on those scores.
With this completed dictionary in hand, I computed a weighted average sentiment score for financial documents and correlated it with change in stock price on the date of submission from year-to-year and open-to-close. I computed a similar measure for L&M, substituting scores of +1 and -1 for positive and negative words, respectively. For both stock price changes, my gradient dictionary was more closely linked than the L&M equivalent, providing strong evidence for a gradient characterization of sentiment, at least in this domain.
If you would like more detail on my methods, feel free to email me for a copy of the working paper.