Stock Headline Sentiment Accuracy Classifier
- Category: ML Model
- Client: Durham University
- Project date: May 2021
An investigation into how accurate the sentiment of stock news headlines are, compare to the stock's real-world performance. Implemented using Python, SKLearn, Pandas and Requests.
- Situation: With the record numbers of new investors, I was interested to examine how accurately the sentiments of analyst reports for specific stocks represented their performance.
- Task: To examine the accuracy of stock article headlines’ sentiments, I would first need to categorize a dataset of headlines through natural-language processing. Then, I would need to access a dataset of stock performances for the relevant stocks and compare the results over varying timeframes.
- Action: Firstly, I explored a Kaggle dataset of millions of headlines, labelled with relevant stock trackers and performed some prep on the data to ensure it’s quality. I then accessed the Marketstack API and automatically gathered the price data for the last decade for all 100 stocks. I then compared the sentiments with price change over time, before training a support vector machine on the labelled headlines to examine if consistent trends in inaccurate headlines could be examined.
- Result: Of the two models created, both displayed better-than-random classifications of headlines as having correct or incorrect sentiment. Such a result indicates a relationship between headline language and sentiment accuracy, something I wish to explore further.