Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The key arguments for the low utilization of statistical techniques in financial sentiment analysis have been the difficulty of implementation for practical applications and the lack of high quality training data for building such models. Especially in the case of finance and economic texts, annotated collections are a scarce resource and many are reserved for proprietary use only. To resolve the missing training data problem, we present a collection of ∼ 5000 sentences to establish human-annotated standards for benchmarking alternative modeling techniques.
The objective of the phrase level annotation task was to classify each example sentence into a positive, negative or neutral category by considering only the information explicitly available in the given sentence. Since the study is focused only on financial and economic domains, the annotators were asked to consider the sentences from the view point of an investor only; i.e. whether the news may have positive, negative or neutral influence on the stock price. As a result, sentences which have a sentiment that is not relevant from an economic or financial perspective are considered neutral.
This release of the financial phrase bank covers a collection of 4840 sentences. The selected collection of phrases was annotated by 16 people with adequate background knowledge on financial markets. Three of the annotators were researchers and the remaining 13 annotators were master’s students at Aalto University School of Business with majors primarily in finance, accounting, and economics.
Given the large number of overlapping annotations (5 to 8 annotations per sentence), there are several ways to define a majority vote based gold standard. To provide an objective comparison, we have formed 4 alternative reference datasets based on the strength of majority agreement: all annotators agree, >=75% of annotators agree, >=66% of annotators agree and >=50% of annotators agree.
FinanceMTEB/financial_phrasebank dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "financial_phrasebank"
64/16/20 Split of the sentences_50agree subset of financial_phrasebank, according to the FinBERT paper.
descartes100/enhanced-financial-phrasebank dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Financial Sentiment Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sbhatti/financial-sentiment-analysis on 13 February 2022.
--- Dataset description provided by original source is as follows ---
The following data is intended for advancing financial sentiment analysis research. It's two datasets (FiQA, Financial PhraseBank) combined into one easy-to-use CSV file. It provides financial sentences with sentiment labels.
Malo, Pekka, et al. "Good debt or bad debt: Detecting semantic orientations in economic texts." Journal of the Association for Information Science and Technology 65.4 (2014): 782-796.
--- Original source retains full ownership of the source dataset ---
vatolinalex/financial_phrasebank dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset Card for financial_phrasebank
Dataset Summary
Polar sentiment dataset of sentences from financial news. The dataset consists of 4840 sentences from English language financial news categorised by sentiment. The dataset is divided by agreement rate of 5-8 annotators.
Supported Tasks and Leaderboards
Sentiment Classification
Languages
English
Dataset Structure
Data Instances
{ "sentence": "Pharmaceuticals group Orion Corp… See the full description on the dataset page: https://huggingface.co/datasets/gtfintechlab/financial_phrasebank_sentences_allagree.
autoevaluate/autoeval-eval-financial_phrasebank-sentences_50agree-d5dbba-47711145221 dataset hosted on Hugging Face and contributed by the HF Datasets community
sooyeon/autotrain-data-flan-t5-large-financial-phrasebank-lora dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Dataset Card for financial_phrasebank
Dataset Description
Auditor review data collected by News Department
Point of Contact: Talked to COE for Auditing
Dataset Summary
Auditor sentiment dataset of sentences from financial news. The dataset consists of *** sentences from English language financial news categorized by sentiment. The dataset is divided by agreement rate of 5-8 annotators.
Supported Tasks and Leaderboards
Sentiment Classification… See the full description on the dataset page: https://huggingface.co/datasets/rajistics/auditor_review.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Dataset Card for German financial_phrasebank
Dataset Description
Dataset Summary
This datset is a German translation of the financial phrasebank of Malo et al. (2013) with a minimum agreement rate between annotators of 75% (3453 observations in total). The translation was mechanically accomplished with Deepl.
Supported Tasks and Leaderboards
Sentiment Classification
Languages
German
Dataset Structure
Data Instances
{… See the full description on the dataset page: https://huggingface.co/datasets/scherrmann/financial_phrasebank_75agree_german.
Dataset Card for Multilingual Financial Sentiment Analysis
This dataset is based in the combination of two datasets, FiQA and Financial PhraseBank, automatically translated to spanish, french and german.
Dataset Details
Dataset Sources
KaggleHub: Financial Sentiment Analysis Paper: Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts Financial Opinion Mining and Question Answering
Uses
Multilingual financial sentiment… See the full description on the dataset page: https://huggingface.co/datasets/nojedag/financial_phrasebank_multilingual.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The key arguments for the low utilization of statistical techniques in financial sentiment analysis have been the difficulty of implementation for practical applications and the lack of high quality training data for building such models. Especially in the case of finance and economic texts, annotated collections are a scarce resource and many are reserved for proprietary use only. To resolve the missing training data problem, we present a collection of ∼ 5000 sentences to establish human-annotated standards for benchmarking alternative modeling techniques.
The objective of the phrase level annotation task was to classify each example sentence into a positive, negative or neutral category by considering only the information explicitly available in the given sentence. Since the study is focused only on financial and economic domains, the annotators were asked to consider the sentences from the view point of an investor only; i.e. whether the news may have positive, negative or neutral influence on the stock price. As a result, sentences which have a sentiment that is not relevant from an economic or financial perspective are considered neutral.
This release of the financial phrase bank covers a collection of 4840 sentences. The selected collection of phrases was annotated by 16 people with adequate background knowledge on financial markets. Three of the annotators were researchers and the remaining 13 annotators were master’s students at Aalto University School of Business with majors primarily in finance, accounting, and economics.
Given the large number of overlapping annotations (5 to 8 annotations per sentence), there are several ways to define a majority vote based gold standard. To provide an objective comparison, we have formed 4 alternative reference datasets based on the strength of majority agreement: all annotators agree, >=75% of annotators agree, >=66% of annotators agree and >=50% of annotators agree.