Model Card for Sentiment Analysis on Financial News
Overview
This dataset contains sentiments for financial news headlines from the perspective of a retail investor. The data is derived from the research by Malo et al. (2014), which focuses on detecting semantic orientations in economic texts.
Dataset Details
Source: Malo, P., Sinha, A., Takala, P., Korhonen, P., and Wallenius, J. (2014). “Good debt or bad debt: Detecting semantic orientations in economic… See the full description on the dataset page: https://huggingface.co/datasets/mltrev23/financial-sentiment-analysis.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their sentiment.
The dataset holds 11,932 documents annotated with 3 labels:
sentiments = { "LABEL_0": "Bearish", "LABEL_1": "Bullish", "LABEL_2": "Neutral" }
The data was collected using the Twitter API. The current dataset supports the multi-class classification… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.
To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.
We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.
Examples of Annotated Headlines
Forex Pair
Headline
Sentiment
Explanation
GBPUSD
Diminishing bets for a move to 12400
Neutral
Lack of strong sentiment in either direction
GBPUSD
No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft
Positive
Positive sentiment towards GBPUSD (Cable) in the near term
GBPUSD
When are the UK jobs and how could they affect GBPUSD
Neutral
Poses a question and does not express a clear sentiment
JPYUSD
Appropriate to continue monetary easing to achieve 2% inflation target with wage growth
Positive
Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
USDJPY
Dollar rebounds despite US data. Yen gains amid lower yields
Neutral
Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
USDJPY
USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains
Negative
USDJPY is expected to reach a lower value, with the USD losing value against the JPY
AUDUSD
<p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
Positive
Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Daniel-ML/sentiment-analysis-for-financial-news-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset (FinancialPhraseBank) contains the sentiments for financial news headlines from the perspective of a retail investor.
The dataset contains two columns, "Sentiment" and "News Headline". The sentiment can be negative, neutral or positive.
Malo, P., Sinha, A., Korhonen, P., Wallenius, J., & Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4), 782-796.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.
Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.
"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.
I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."
This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'
The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot
I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
https://brightdata.com/licensehttps://brightdata.com/license
Stay informed with our comprehensive Financial News Dataset, designed for investors, analysts, and businesses to track market trends, monitor financial events, and make data-driven decisions.
Dataset Features
Financial News Articles: Access structured financial news data, including headlines, summaries, full articles, publication dates, and source details. Market & Economic Indicators: Track financial reports, stock market updates, economic forecasts, and corporate earnings announcements. Sentiment & Trend Analysis: Analyze news sentiment, categorize articles by financial topics, and monitor emerging trends in global markets. Historical & Real-Time Data: Retrieve historical financial news archives or access continuously updated feeds for real-time insights.
Customizable Subsets for Specific Needs Our Financial News Dataset is fully customizable, allowing you to filter data based on publication date, region, financial topics, sentiment, or specific news sources. Whether you need broad coverage for market research or focused data for investment analysis, we tailor the dataset to your needs.
Popular Use Cases
Investment Strategy & Risk Management: Monitor financial news to assess market risks, identify investment opportunities, and optimize trading strategies. Market & Competitive Intelligence: Track industry trends, competitor financial performance, and economic developments. AI & Machine Learning Training: Use structured financial news data to train AI models for sentiment analysis, stock prediction, and automated trading. Regulatory & Compliance Monitoring: Stay updated on financial regulations, policy changes, and corporate governance news. Economic Research & Forecasting: Analyze financial news trends to predict economic shifts and market movements.
Whether you're tracking stock market trends, analyzing financial sentiment, or training AI models, our Financial News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Fine-grained financial sentiment analysis on news headlines is a challenging task requiring human-annotated datasets to achieve high performance. Limited studies have tried to address the sentiment extraction task in a setting where multiple entities are present in a news headline. In an effort to further research in this area, we make publicly available SEntFiN 1.0, a human-annotated dataset of 10,700+ news headlines with entity-sentiment annotations, of which 2,800+ headlines contain multiple entities, often with conflicting sentiments.
Acknowledgements Sinha, A., Kedas, S., Kumar, R., & Malo, P. (2022). SEntFiN 1.0: Entity‐aware sentiment analysis for financial news. Journal of the Association for Information Science and Technology. DOI: https://doi.org/10.1002/asi.24634
Please refer to the above paper for further details about dataset creation and analysis. We propose a framework that enables the extraction of entity-relevant sentiments using a feature-based approach rather than an expression-based approach. For sentiment extraction, we utilize 12 different learning schemes utilizing lexicon-based and pretrained sentence representations and five classification approaches. Our experiments indicate that overall, RoBERTa was the best performer with other BERT-based models as close competitors.
In our study, we have also validated the effect of news sentiments on aggregate market movements.
Original Data Source: Aspect based Sentiment Analysis for Financial News
Dataset Card for Dataset Name
The FinancialNewsSentiment_26000 dataset comprises 26,000 rows of financial news articles related to the Indian market. It features four columns: URL, Content (scrapped content), Summary (generated using the T5-base model), and Sentiment Analysis (gathered using the GPT add-on for Google Sheets). The dataset is designed for sentiment analysis tasks, providing a comprehensive view of sentiments expressed in financial news.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/kdave/Indian_Financial_News.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset consists of financial news articles collected from HuffPost, spanning from 2012 to 2022. The data is structured in JSON format, containing headlines, article links, short descriptions, authors, categories, and publication dates. This dataset supports applications in financial NLP, time-series analysis, and sentiment-based trading strategies.
Finance news labeled by their sentiment. Can be used for NLP.
Here are the data operations made on the texts:
Nulls removal Duplicates removal Balancing (so there are as many texts of each sentiment) Stripping (remove any leading and trailing white spaces and new lines) URL removal Contractions Expansion (e.g. converting "it's" to "it is") Shuffling This dataset still needs some data cleaning operations:
Fix special characters (display '&' instead of "&")
Remove HTML tags (like "
")
Translate all text to english (some texts are in other languages, but only a few)
Also, note that emojis are present in some texts. I let you decide if you want to process them for your sentiment analysis.
This dataset is the cleaned concatenation of multiple finance news sentiments datasets:
https://www.kaggle.com/datasets/yash612/stockmarket-sentiment-dataset https://www.kaggle.com/datasets/borhanitrash/twitter-financial-news-sentiment-dataset https://www.kaggle.com/datasets/sidarcidiacono/news-sentiment-analysis-for-stock-data-by-company https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news Thanks for their work!
CC-BY-NC
Original Data Source: Finance News Sentiments
This dataset was created by Kamlesh Nrupnarayan
This dataset was created by Shaurya Nandecha
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This package contains the datasets and source codes used in the PhD thesis entitled Predicting the Brazilian stock market using sentiment analysis, technical indicators and stock prices. The following files are included: File Labeled.zip - financial news labeled in two classes (Positive and Negative), organized to train Sentiment Analysis models. Part of these news were initially presented in [1]. Besides the news in this file, in the related PhD thesis the training dataset was complemented with the labeled news presented in [2]. File Unlabeled.zip - general unlabeled financial news collected during the period 2010-2020 from the following online sources: G1, Folha de São Paulo and Estadão. This file contains news from the Bovespa index and from the following companies: Banco do Brasil, Itau, Gerdau and Ambev. File Stocks.zip - stock prices from the companies Banco do Brasil, Itau, Gerdau, Ambev, and the Bovespa index. The considered period ranges from 2010 to 2020. File Models.zip - contains the source codes of the models used in the PhD thesis (i.e., Multilayer Perceptron, Long Short-Term Memory, Bidirectional Long Short-Term Memory, Convolutional Neural Network, and Support Vector Machines). File Utils.zip - contains the source codes of the preprocessing step designed for the methodology of this work (i.e., load data and generate the word embeddings), alongside with stocks manipulation, and investment evaluation. [1] Carosia, A. E. D. O., Januário, B. A., da Silva, A. E. A., & Coelho, G. P. (2021). Sentiment Analysis Applied to News from the Brazilian Stock Market. IEEE Latin America Transactions, 100. DOI: 10.1109/TLA.2022.9667151 [2] MARTINS, R. F.; PEREIRA, A.; BENEVENUTO, F. An approach to sentiment analysis of web applications in portuguese. Proceedings of the 21st Brazilian Symposium on Multimedia and the Web, ACM, p. 105–112, 2015. DOI: 10.1145/2820426.2820446
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Financial Sentiment Analysis Dataset
Overview
This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.
Data Description
The dataset comprises tweets… See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.
https://sentalyse.com/en/termshttps://sentalyse.com/en/terms
Downloadable company sentiment dataset over time for FactSet Research Systems Inc., based on trusted financial news sources.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Stock Market Dataset is designed for predictive analysis and machine learning applications in financial markets. It includes 13647 records of simulated stock trading data with features commonly used in stock price forecasting.
🔹 Key Features Date – Trading day timestamps (business days only) Open, High, Low, Close – Simulated stock prices Volume – Trading volume per day RSI (Relative Strength Index) – Measures market momentum MACD (Moving Average Convergence Divergence) – Trend-following momentum indicator Sentiment Score – Simulated market sentiment from financial news & social media Target – Binary label (1: Price goes up, 0: Price goes down) for next-day prediction This dataset is useful for training hybrid deep learning models such as LSTM, CNN, and Attention-based networks for stock market forecasting. It enables financial analysts, traders, and AI researchers to experiment with market trends, technical analysis, and sentiment-based predictions.
Note This dataset is not shuffled The data has over 100000+ rows with sentences and sentiment of each sentence 0 represent that the news is negative or neutral (Therefore likely the stock will go down) 1 represent that the news is positive ( Therefore the likely stock will go up)
Original Data Source: Stock News Sentiment Analysis(Massive Dataset)
https://qlsolutions.synology.me/en/termshttps://qlsolutions.synology.me/en/terms
Downloadable company sentiment dataset over time for Broadridge Financial Solutions Inc., based on trusted financial news sources.
https://qlsolutions.synology.me/en/termshttps://qlsolutions.synology.me/en/terms
Downloadable company sentiment dataset over time for Cboe Global Markets, Inc., based on trusted financial news sources.
Model Card for Sentiment Analysis on Financial News
Overview
This dataset contains sentiments for financial news headlines from the perspective of a retail investor. The data is derived from the research by Malo et al. (2014), which focuses on detecting semantic orientations in economic texts.
Dataset Details
Source: Malo, P., Sinha, A., Takala, P., Korhonen, P., and Wallenius, J. (2014). “Good debt or bad debt: Detecting semantic orientations in economic… See the full description on the dataset page: https://huggingface.co/datasets/mltrev23/financial-sentiment-analysis.