16 datasets found

h
FinancialPhraseBank
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Massaron, FinancialPhraseBank [Dataset]. https://huggingface.co/datasets/lmassaron/FinancialPhraseBank
Explore at:
Authors
Luca Massaron
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Financial PhraseBank

Dataset Description

Repository: [Link to the source, e.g., on Kaggle or original paper's site] Paper: Good debt or bad debt: Detecting semantic orientations in economic texts This dataset (FinancialPhraseBank) contains the sentiments for 4846 financial news headlines from the perspective of a retail investor. The dataset is labeled with "negative", "neutral", or "positive" sentiments.

Content

The dataset contains two… See the full description on the dataset page: https://huggingface.co/datasets/lmassaron/FinancialPhraseBank.
distilbert-reddit-financial-phrasebank-allagree
kaggle.com
zip
Updated Nov 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel M. (2021). distilbert-reddit-financial-phrasebank-allagree [Dataset]. https://www.kaggle.com/datasets/muniozdaniel0/distilbert-reddit-financial-phrasebank-allagree
Explore at:
zip(2437573921 bytes)Available download formats
Dataset updated
Nov 9, 2021
Authors
Daniel M.
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Daniel M.

Released under CC0: Public Domain

Contents
h
financial-phrasebank-all-agree-classification
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ghbacct, financial-phrasebank-all-agree-classification [Dataset]. https://huggingface.co/datasets/ghbacct/financial-phrasebank-all-agree-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
ghbacct
Description
Dataset Card for "financial-phrasebank-all-agree-classification"

More Information needed
Financial Sentiment Analysis
kaggle.com
zip
Updated Feb 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sbhatti (2022). Financial Sentiment Analysis [Dataset]. https://www.kaggle.com/sbhatti/financial-sentiment-analysis
Explore at:
zip(282375 bytes)Available download formats
Dataset updated
Feb 19, 2022
Authors
sbhatti
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Data

The following data is intended for advancing financial sentiment analysis research. It's two datasets (FiQA, Financial PhraseBank) combined into one easy-to-use CSV file. It provides financial sentences with sentiment labels.

Citations

Malo, Pekka, et al. "Good debt or bad debt: Detecting semantic orientations in economic texts." Journal of the Association for Information Science and Technology 65.4 (2014): 782-796.
English Russian Financial Phrasebank
kaggle.com
zip
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MukhammedAbuSuveilim (2025). English Russian Financial Phrasebank [Dataset]. https://www.kaggle.com/datasets/mukhammedabusuveilim/english-russian-financial-phrasebank/suggestions
Explore at:
zip(297557 bytes)Available download formats
Dataset updated
Jan 7, 2025
Authors
MukhammedAbuSuveilim
Description
Dataset

This dataset was created by MukhammedAbuSuveilim

Contents
h
financial-classification
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Muchinguri, financial-classification [Dataset]. https://huggingface.co/datasets/nickmuchi/financial-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Nicholas Muchinguri
Description
Dataset Creation

This dataset combines financial phrasebank dataset and a financial text dataset from Kaggle. Given the financial phrasebank dataset does not have a validation split, I thought this might help to validate finance models and also capture the impact of COVID on financial earnings with the more recent Kaggle dataset.
h
financial-phrasebank-all-agree-clustering
huggingface.co
Updated Jul 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ghbacct (2023). financial-phrasebank-all-agree-clustering [Dataset]. https://huggingface.co/datasets/ghbacct/financial-phrasebank-all-agree-clustering
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2023
Authors
ghbacct
Description
Dataset Card for "financial-phrasebank-all-agree-clustering"

More Information needed
FinSen Financial Sentiment Dataset
kaggle.com
zip
Updated Oct 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eagle W H L (2024). FinSen Financial Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/eaglewhl/finsen-financial-sentiment-dataset/code
Explore at:
zip(6549212 bytes)Available download formats
Dataset updated
Oct 29, 2024
Authors
Eagle W H L
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Enhancing Financial Market Predictions: Causality-Driven Feature Selection

Note:[Please help give a Vote 👍 if you think this FinSen dataset is good for you, Thanks:)]

This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset’s extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability.

Technical Framework

https://github.com/user-attachments/assets/5df3c4a7-2403-460a-ac7f-2d69572fec2f" alt="image">

Our FinSen Dataset

This repository contains the dataset for "https://arxiv.org/abs/2408.01005">Enhancing Financial Market Predictions: Causality-Driven Feature Selection, which has been accepted in ADMA 2024.

If the dataset or the paper has been useful in your research, please add a citation to our work:

@article{liang2024enhancing, title={Enhancing Financial Market Predictions: Causality-Driven Feature Selection}, author={Liang, Wenhao and Li, Zhengyang and Chen, Weitong}, journal={arXiv e-prints}, pages={arXiv--2408}, year={2024} }

Datasets

[FinSen] can be downloaded manually from the repository as csv file. Sentiment and its score are generated by FinBert model from the Hugging Face Transformers library under the identifier "ProsusAI/finbert". (Araci, Dogu. "Finbert: Financial sentiment analysis with pre-trained language models." arXiv preprint arXiv:1908.10063 (2019).)

We only provide US for research purpose usage, please contact w.liang@adelaide.edu.au for other countries (total 197 included) if necessary.

https://github.com/user-attachments/assets/f28e670a-7329-409d-81cb-1fe47da22140" alt="image">

Finsen Data Sample:

https://github.com/user-attachments/assets/6ab08486-85b7-4cf6-b4fe-7d4294624f91">

We also provide other NLP datasets for text classification tasks here, please cite them correspondingly once you used them in your research if any.

20Newsgroups. Joachims, T., et al.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: ICML. vol. 97, pp. 143–151. Citeseer (1997)

AG News. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015)

Financial PhraseBank. Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology 65(4), 782–796 (2014)

Dataloader for FinSen

We provide the preprocessing file finsen.py for our FinSen dataset under dataloaders directory for more convienient usage.

Models - Text Classification

DAN-3.

Gobal Pooling CNN.

Models - Regression Prediction

LSTM

Using Sentiment Score from FinSen Predict Result on S&P500

https://github.com/user-attachments/assets/2d9b4dd7-7f59-425c-b812-2cca57719243" alt="image">

:smiley: ☺ Happy Research !
h
financial_phrasebank
huggingface.co
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avi Trost (2025). financial_phrasebank [Dataset]. https://huggingface.co/datasets/atrost/financial_phrasebank
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 19, 2025
Authors
Avi Trost
Description
Dataset Card for "financial_phrasebank"

64/16/20 Split of the sentences_50agree subset of financial_phrasebank, according to the FinBERT paper.
h
indonesian-financial-phrasebank
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intan Maharani, indonesian-financial-phrasebank [Dataset]. https://huggingface.co/datasets/intanm/indonesian-financial-phrasebank
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Intan Maharani
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
intanm/indonesian-financial-phrasebank dataset hosted on Hugging Face and contributed by the HF Datasets community
Stock Market News Data in Portuguese
kaggle.com
zip
Updated Jul 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mateus Picanco (2021). Stock Market News Data in Portuguese [Dataset]. https://www.kaggle.com/mateuspicanco/financial-phrase-bank-portuguese-translation
Explore at:
zip(481703 bytes)Available download formats
Dataset updated
Jul 7, 2021
Authors
Mateus Picanco
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Stock Market News Data in Portuguese

The Financial Phrase Bank is a dataset originally developed for the paper Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts, made available by researchers from Aalto University and the Indian Institute of Management. The dataset allows for a useful benchmark for fine-tuning Language Models on Sentiment Analysis Tasks.

As the amount of annotated text data (especially about the financial market) in Portuguese, I went ahead and translated the entire dataset for people to try out Sentiment Analysis tasks in Portuguese.

Content

The dataset originally contains about 4840 manually annotated financial news in English and consists of three columns: 1. y: the annotated label for the sentiment of the news text (neutral, positive, negative); 2. text: the original text for each record; 3. text_pt: the translated and that I manually validated version of the original record;

Acknowledgments

[1] Malo, P., Sinha, A., Korhonen, P., Wallenius, J., & Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4), 782-796.

Photo by Markus Winkler on Unsplash
financial-phrase-bank-portuguese-translation
kaggle.com
zip
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pixel_Dust_64 (2024). financial-phrase-bank-portuguese-translation [Dataset]. https://www.kaggle.com/datasets/pixeldust64/financial-phrase-bank-portuguese-translation/code
Explore at:
zip(480183 bytes)Available download formats
Dataset updated
Jan 24, 2024
Authors
Pixel_Dust_64
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Pixel_Dust_64

Released under Apache 2.0

Contents
h
financial_reasoning_aggregated
huggingface.co
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi Peng Neo (2025). financial_reasoning_aggregated [Dataset]. https://huggingface.co/datasets/neoyipeng/financial_reasoning_aggregated
Explore at:
Dataset updated
Nov 27, 2025
Authors
Yi Peng Neo
Description
Aggregated Financial Reasoning Dataset for Reinforcement Fine Tuning(RFT) in Finance

A multi-source NLP dataset combining Financial PhraseBank, FinQA, news Headlines, and Twitter data, labeled for sentiment and QA tasks.

I don't own any of the datasets, just curating for my own reasoning experiments and teaching materials. Dataset Overview

PurposeThis dataset is an aggregation of text sources that have a discrete output, which allows downstream RFT while… See the full description on the dataset page: https://huggingface.co/datasets/neoyipeng/financial_reasoning_aggregated.
financial_phrase_bank_pt_br
kaggle.com
zip
Updated Jan 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniele Simas (2024). financial_phrase_bank_pt_br [Dataset]. https://www.kaggle.com/datasets/danielesimas/financial-phrase-bank-pt-br
Explore at:
zip(481703 bytes)Available download formats
Dataset updated
Jan 22, 2024
Authors
Daniele Simas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Daniele Simas

Released under MIT

Contents
h
financial_phrasebank_multilingual
huggingface.co
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Néstor Ojeda González (2025). financial_phrasebank_multilingual [Dataset]. https://huggingface.co/datasets/nojedag/financial_phrasebank_multilingual
Explore at:
Dataset updated
Jun 20, 2025
Authors
Néstor Ojeda González
Description
Dataset Card for Multilingual Financial Sentiment Analysis

This dataset is based in the combination of two datasets, FiQA and Financial PhraseBank, automatically translated to spanish, french and german.

Dataset Details Dataset Sources

KaggleHub: Financial Sentiment Analysis Paper: Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts Financial Opinion Mining and Question Answering

Uses

Multilingual financial sentiment… See the full description on the dataset page: https://huggingface.co/datasets/nojedag/financial_phrasebank_multilingual.
h
financial_phrasebank_75agree_german
huggingface.co
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moritz Scherrmann (2025). financial_phrasebank_75agree_german [Dataset]. https://huggingface.co/datasets/scherrmann/financial_phrasebank_75agree_german
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2025
Authors
Moritz Scherrmann
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Dataset Card for German financial_phrasebank

Dataset Description Dataset Summary

This datset is a German translation of the financial phrasebank of Malo et al. (2013) with a minimum agreement rate between annotators of 75% (3453 observations in total). The translation was mechanically accomplished with Deepl.

Supported Tasks and Leaderboards

Sentiment Classification

Languages

German

Dataset Structure Data Instances

{… See the full description on the dataset page: https://huggingface.co/datasets/scherrmann/financial_phrasebank_75agree_german.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Luca Massaron, FinancialPhraseBank [Dataset]. https://huggingface.co/datasets/lmassaron/FinancialPhraseBank

FinancialPhraseBank

Financial PhraseBank

lmassaron/FinancialPhraseBank

Explore at:

89 scholarly articles cite this dataset (View in Google Scholar)

Authors

Luca Massaron

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Dataset Card for Financial PhraseBank

  Dataset Description

Repository: [Link to the source, e.g., on Kaggle or original paper's site] Paper: Good debt or bad debt: Detecting semantic orientations in economic texts This dataset (FinancialPhraseBank) contains the sentiments for 4846 financial news headlines from the perspective of a retail investor. The dataset is labeled with "negative", "neutral", or "positive" sentiments.

  Content

The dataset contains two… See the full description on the dataset page: https://huggingface.co/datasets/lmassaron/FinancialPhraseBank.

Clear search

Close search

Google apps

Main menu

FinancialPhraseBank

distilbert-reddit-financial-phrasebank-allagree

Dataset

Contents

financial-phrasebank-all-agree-classification

Financial Sentiment Analysis

Data

Citations

English Russian Financial Phrasebank

Dataset

Contents

financial-classification

financial-phrasebank-all-agree-clustering

FinSen Financial Sentiment Dataset

Enhancing Financial Market Predictions: Causality-Driven Feature Selection

Technical Framework

Our FinSen Dataset

Datasets

Dataloader for FinSen

Models - Text Classification

Models - Regression Prediction

Using Sentiment Score from FinSen Predict Result on S&P500

financial_phrasebank

indonesian-financial-phrasebank

Stock Market News Data in Portuguese

Stock Market News Data in Portuguese

Content

Acknowledgments

financial-phrase-bank-portuguese-translation

Dataset

Contents

financial_reasoning_aggregated

financial_phrase_bank_pt_br

Dataset

Contents

financial_phrasebank_multilingual

financial_phrasebank_75agree_german

FinancialPhraseBank

Financial PhraseBank

lmassaron/FinancialPhraseBank