Let’s pretend we’re back in 2016 for a moment. It's the morning of Tuesday February 9th.  You are ready to take on the day right after you check Reuters news.  A bearish headline catches your eye:  “SE Asia Stocks-Fall on weak Asia, risk aversion; Singapore falls sharply” https://reut.rs/2K7h7wg. It’s 8:05 am now and your coffee still hasn’t kicked in. You have time for another quick article before work.  Another ominous headline appears: “IEA sees global oil glut worsening, OPEC deal unlikely” https://reut.rs/2qRl9Qh.  You start wondering if it's time to sell some holdings.  The article “MIDEAST STOCKS-Markets may weaken as global shares slump” https://reut.rs/2HU6kow seals the deal.  In a panic you call your investment manager to get out the market.  These were not the only examples of pessimism in the headlines that day.  February 6, 2016 Reuter's headlines had lowest sentiment over a six year period (2012-2018). The next day stocks plummeted and the day after the S&P hit about a two year low point. Was the sentiment an early indicator of the slide?  The efficient market hypothesis says no.   Price adjustments occur well in advance of breaking news stories.  Put another way,  one won't avoid a market correction by acting on publicly available information.   But to be certain I’ll test for a linear relationship between today’s headlines and tomorrow’s stock price . The test data contained 4,148,808 headlines from Reuters from  01/01/2012 to 12/31/2017.  

I used the Bing Liu lexicon for scoring words in each headline.  It's built into the tidytext R package.

sentexmp

I compared each word in the text with the lexicon and count the instances of negative and positive words. The difference between the counts yielded a sentiment score. The headline “SE Asia Stocks-Fall on weak Asia, risk aversion; Singapore falls sharply” had zero positive words, six positive words, and six neutral words (specified with NA).  Thus, it scored negative six ( 0 – 6) = -6.

I applied the function to all headlines.  The chart below trends sentiment rolled up to a monthly level.


Correlation Analysis

I visualized the correlation between the daily return from stock price data on Yahoo and the previous day’s sentiment score.  The scatter plot showed no strong linear relationship better between the variables.  A t-test identified some significance of the correlations.

The absolute value of  t-calculated exceeded t-critical at the 90% level (1.65) but not the 95% level of confidence (1.96).


Regression Analysis

Reuters's headline sentiment only explained .002053 of the variation in daily return.  The regression showed the slope coefficient .0004 is different from zero. Despite the evidence of some statistical significance, it would be useless to make economic decisions from this model.

At this point, I had answered if one could predict stock prices from Reutor's headlines.  I could have stopped at this point but for academic purposes I continued analyzing the model.  I checked for heteroskedastic errors to determine if the coefficients were reliable.  I plotted errors with the sentiment score.

I couldn't detect heteroskedasitcy based on visual inspection of the scatter plot. The errors appeared random. I perform Breusch-Pagan (BP) test to confirm.  The first step of the BP test regress the was squaring residuals on the fitted model values.  R-squared times the number of observations (.0003 x 1490 = 4.47), exceeded the critical value of 3.84 (chi squared distribution at the .05 level of significance).  The model suffered from heteroskedasticity, thus the coefficient t-values were unreliable.

I adjusted the model to account for heteroskaditicy using robust standard errors. The R package commarobust enabled a substitution of the coefficient's errors.  After the substitution the t-statistic lost significance at the .1 confidence level. I could no longer conclude that the slop coefficient is different from zero.  The model provided no predictive or explanatory power whatsoever.


Conclusion

I used the Bing Liu lexicon simply because it was integrated with the tidytext package.  Future studies could use more robust lexicons and might better proxy for headline sentiment. The Bing Liu holds only 6,788 words. Some lexicons like the MSOL lexicon by Mohammad et al. (2009) contain over 76,000 terms. Another weakness of the Bing Liu is that it can only assign three possible classes – positive, negative or neutral.  Some lexicons can differentiate between  negativity, sadness, anticipation, fear, and disgust.  There might be predictive power in these emotions.   Additionally, it does not measure the intensity among these classes.   The headline “Stock performance was bad” has the same sentiment score as  “Stock performance was the worst ever”.

One could expand the study could to include other news sources. A key assumption of the model was that Reuters was a good proxy of news. Other news sources are likely to vary from Reuters sentiment.  This variation may stem from political slant or readership demographics.

This analysis offered empirical evidence suggesting news headlines cannot predict stock price movements.  This knowledge actually is quite practical- just because the news reports gloom doesn’t mean it’s time to liquidate your 401k. I also see this analysis as a caution against questionable investing advice. If you hear a "hot tip" about a stock then chance to profit from the information has probably already passed.