Analyzing Stock Price News Sentiment with Machine Learning Models in Python

Sahaj Godhani
5 min readJun 4, 2023

--

Introduction: In today’s fast-paced stock market, news sentiment plays a crucial role in shaping investor behavior and influencing stock prices. By analyzing the sentiment behind stock-related news articles, investors can gain valuable insights to make informed trading decisions. In this blog, we will explore how machine learning models in Python can be utilized to analyze stock price news sentiment. We will walk through the process of building and training ML models to predict sentiment, enabling investors to stay ahead of market trends.

News Sentiment For Stock Market
  1. Understanding the Role of News Sentiment in Stock Markets: Sentiment analysis refers to the process of determining the emotional tone or sentiment of a text. In the context of stock markets, sentiment analysis helps gauge the positive, negative, or neutral sentiment of news articles related to stocks. By understanding the sentiment behind such news, investors can uncover potential market trends and sentiment-driven price movements.
  2. Collecting Stock Price News Data: To begin, we need a reliable source of stock price news data. One popular option is to leverage APIs provided by financial news platforms, such as Alpha Vantage or Yahoo Finance. These APIs offer access to a wide range of news articles related to specific stocks. In this blog, we will use the Alpha Vantage API to fetch the news data.
  3. Preparing the Data for Sentiment Analysis: Once we have collected the news data, we need to preprocess it before feeding it into our ML models. Key preprocessing steps include:
  • Text cleaning: Removing unnecessary characters, punctuation, and special symbols.
  • Tokenization: Breaking down the text into individual words or tokens.
  • Stop-word removal: Removing common words that do not carry significant sentiment.
  • Lemmatization or stemming: Reducing words to their root form for better analysis.

4. Building a Sentiment Analysis Model: In this section, we will build an ML model to classify news sentiment. We can employ various ML algorithms, such as Naive Bayes, Support Vector Machines (SVM), or even deep learning models like Recurrent Neural Networks (RNNs) or Transformers. For this blog, we will use a simple yet effective algorithm called Multinomial Naive Bayes, which is well-suited for text classification tasks.

5. Training and Evaluating the Model: To train the sentiment analysis model, we need labeled data, where each news article is tagged with its corresponding sentiment (positive, negative, or neutral). This labeled data serves as the ground truth for training the model. We split the data into training and testing sets, and then train the model using the training set. Finally, we evaluate the model’s performance on the testing set using evaluation metrics such as accuracy, precision, recall, and F1-score.

6. Predicting Sentiment for New News Articles: Once our model is trained and evaluated, we can utilize it to predict sentiment for new, unseen news articles. We apply the same preprocessing steps to these articles and then feed them into the trained model to obtain sentiment predictions.

7. Visualizing Sentiment Results: To gain better insights and interpret the sentiment analysis results, we can create visualizations. Python offers various visualization libraries such as Matplotlib or Plotly. We can generate visualizations like bar charts or word clouds to showcase sentiment distribution or highlight key sentiment-related terms in news articles

8. Incorporating Sentiment Analysis in Trading Strategies: Sentiment analysis can serve as a valuable component of trading strategies. By incorporating sentiment predictions into trading models, investors can make more informed decisions. For instance, positive sentiment may trigger a bullish stance, while negative sentiment may indicate a bearish outlook.

9. Conclusion and Future Directions: Analyzing stock price news sentiment using ML models in Python empowers investors to leverage the power of data-driven decision-making. With further advancements in ML algorithms and techniques, such as deep learning and transformers, we can expect even more accurate sentiment analysis models in the future. By staying updated with the latest developments in sentiment analysis, investors can maximize their chances of success in the stock market.

Fundamental Ratios and Interactive Charts for AAPL Stock

Finally, the data is ready to be manipulated and viewed in an appealing manner. For each ticker in the inputted list, a new DataFrame will be created that includes its headlines and their respective scores. Last, a final data frame will be created that includes each ticker’s mean sentiment value over all the recent news parsed.

# Import libraries
import pandas as pd
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
from urllib.request import urlopen, Request
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Parameters
n = 3
tickers = ['AAPL', 'TSLA', 'AMZN']

# Get Data
finwiz_url = 'https://finviz.com/quote.ashx?t='
news_tables = {}

for ticker in tickers:
url = finwiz_url + ticker
req = Request(url=url,headers={'user-agent': 'my-app/0.0.1'})
resp = urlopen(req)
html = BeautifulSoup(resp, features="lxml")
news_table = html.find(id='news-table')
news_tables[ticker] = news_table

try:
for ticker in tickers:
df = news_tables[ticker]
df_tr = df.findAll('tr')

print ('\n')
print ('Recent News Headlines for {}: '.format(ticker))

for i, table_row in enumerate(df_tr):
a_text = table_row.a.text
td_text = table_row.td.text
td_text = td_text.strip()
print(a_text,'(',td_text,')')
if i == n-1:
break
except KeyError:
pass

# Iterate through the news
parsed_news = []
for file_name, news_table in news_tables.items():
for x in news_table.findAll('tr'):
text = x.a.get_text()
date_scrape = x.td.text.split()

if len(date_scrape) == 1:
time = date_scrape[0]

else:
date = date_scrape[0]
time = date_scrape[1]

ticker = file_name.split('_')[0]

parsed_news.append([ticker, date, time, text])

# Sentiment Analysis
analyzer = SentimentIntensityAnalyzer()

columns = ['Ticker', 'Date', 'Time', 'Headline']
news = pd.DataFrame(parsed_news, columns=columns)
scores = news['Headline'].apply(analyzer.polarity_scores).tolist()

df_scores = pd.DataFrame(scores)
news = news.join(df_scores, rsuffix='_right')


# View Data
news['Date'] = pd.to_datetime(news.Date).dt.date

unique_ticker = news['Ticker'].unique().tolist()
news_dict = {name: news.loc[news['Ticker'] == name] for name in unique_ticker}

values = []
for ticker in tickers:
dataframe = news_dict[ticker]
dataframe = dataframe.set_index('Ticker')
dataframe = dataframe.drop(columns = ['Headline'])
print ('\n')
print (dataframe.head())

mean = round(dataframe['compound'].mean(), 2)
values.append(mean)

df = pd.DataFrame(list(zip(tickers, values)), columns =['Ticker', 'Mean Sentiment'])
df = df.set_index('Ticker')
df = df.sort_values('Mean Sentiment', ascending=False)
print ('\n')
print (df)

Work Portfolio: https://sahajgodhani.in/

--

--

Sahaj Godhani

Co-Founder at Sahaj || Full Stack Developer || Data Scientist || AI / ML Innovate || Web and Mobile apps