100+ datasets found
  1. Forex News Annotated Dataset for Sentiment Analysis

    • zenodo.org
    • paperswithcode.com
    • +1more
    csv
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali (2023). Forex News Annotated Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.7976208
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.

    To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.

    We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.

    Examples of Annotated Headlines
    
    
        Forex Pair
        Headline
        Sentiment
        Explanation
    
    
    
    
        GBPUSD 
        Diminishing bets for a move to 12400 
        Neutral
        Lack of strong sentiment in either direction
    
    
        GBPUSD 
        No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft 
        Positive
        Positive sentiment towards GBPUSD (Cable) in the near term
    
    
        GBPUSD 
        When are the UK jobs and how could they affect GBPUSD 
        Neutral
        Poses a question and does not express a clear sentiment
    
    
        JPYUSD
        Appropriate to continue monetary easing to achieve 2% inflation target with wage growth 
        Positive
        Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
    
    
        USDJPY
        Dollar rebounds despite US data. Yen gains amid lower yields 
        Neutral
        Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
    
    
        USDJPY
        USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains 
        Negative
        USDJPY is expected to reach a lower value, with the USD losing value against the JPY
    
    
        AUDUSD
    
        <p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
    
        Positive
        Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
    

    Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.

  2. E

    News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian...

    • live.european-language-grid.eu
    • clarin.si
    binary format
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian and Estonian SADEmma 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23729
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Nov 12, 2024
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages except Estonian, we include pairs of source URL (where corresponding text can be found) and sentiment label.

    For Estonian, we randomly sampled 100 articles from "Ekspress news article archive (in Estonian and Russian) 1.0" (http://hdl.handle.net/11356/1408).

    The data is organized in Tab-Separated Values (TSV) format. For Serbian, Bosnian, Macedonian, and Albanian, the dataset contains two columns: sourceURL and sentiment. For Estonian, the dataset consists of three columns: text ID (from the CLARIN.SI reference above), body text, and sentiment label.

  3. Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymized; Anonymized (2022). Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News Media Headlines Using Automated Labelling with Transformer Language Models" [Dataset]. http://doi.org/10.5281/zenodo.5144113
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymized; Anonymized
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains automated sentiment and emotionality annotations of 23 million headlines from 47 popular news media outlets popular in the United States.

    The set of 47 news media outlets analysed (listed in Figure 1 of the main manuscript) was derived from the AllSides organization 2019 Media Bias Chart v1.1. The human ratings of outlets’ ideological leanings were also taken from this chart and are listed in Figure 2 of the main manuscript.

    News articles headlines from the set of outlets analyzed in the manuscript are available in the outlets’ online domains and/or public cache repositories such as The Internet Wayback Machine, Google cache and Common Crawl. Articles headlines were located in articles’ HTML raw data using outlet-specific XPath expressions.

    The temporal coverage of headlines across news outlets is not uniform. For some media organizations, news articles availability in online domains or Internet cache repositories becomes sparse for earlier years. Furthermore, some news outlets popular in 2019, such as The Huffington Post or Breitbart, did not exist in the early 2000’s. Hence, our data set is sparser in headlines sample size and representativeness for earlier years in the 2000-2019 timeline. Nevertheless, 20 outlets in our data set have chronologically continuous partial or full headline data availability since the year 2000. Figure S 1 in the SI reports the number of headlines per outlet and per year in our analysis.

    In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the headline due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. After manual testing, we determined that the percentage of headlines following in this category is very small. Additionally, our method might miss detecting some articles in the online domains of news outlets. To conclude, in a data analysis of over 23 million headlines, we cannot manually check the correctness of every single data instance and hundred percent accuracy at capturing headlines’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our headlines set is representative of headlines in print news media content for the studied time period and outlets analyzed.

    The list of compressed files in this data set is listed next:

    -analysisScripts.rar contains the analysis scripts used in the main manuscript as well as aggregated data of sentiment and emotionality automated annotations of the headlines and human annotations of a subset of headlines sentiment and emotionality used as ground truth.

    -models.rar contains the Transformer sentiment and emotion annotation models used in the analysis. Namely:

    Siebert/sentiment-roberta-large-english from https://huggingface.co/siebert/sentiment-roberta-large-english. This model is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). See more information from the original authors at https://huggingface.co/siebert/sentiment-roberta-large-english

    DistilbertSST2.rar is the default sentiment classification model of the HuggingFace Transformer library https://huggingface.co/ This model is only used to replicate the results of the sentiment analysis with sentiment-roberta-large-english

    DistilRoberta j-hartmann/emotion-english-distilroberta-base from https://huggingface.co/j-hartmann/emotion-english-distilroberta-base. The model is a fine-tuned checkpoint of DistilRoBERTa-base. The model allows annotation of English text with Ekman's 6 basic emotions, plus a neutral class. The model was trained on 6 diverse datasets. Please refer to the original author at https://huggingface.co/j-hartmann/emotion-english-distilroberta-base for an overview of the data sets used for fine tuning. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base

    -headlinesDataWithSentimentLabelsAnnotationsFromSentimentRobertaLargeModel.rar URLs of headlines analyzed and the sentiment annotations of the siebert/sentiment-roberta-large-english Transformer model. https://huggingface.co/siebert/sentiment-roberta-large-english

    -headlinesDataWithSentimentLabelsAnnotationsFromDistilbertSST2.rar URLs of headlines analyzed and the sentiment annotations of the default HuggingFace sentiment analysis model fine-tuned on the SST-2 dataset. https://huggingface.co/

    -headlinesDataWithEmotionLabelsAnnotationsFromDistilRoberta.rar URLs of headlines analyzed and the emotion categories annotations of the j-hartmann/emotion-english-distilroberta-base Transformer model. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base

  4. Financial Market News Sentiment Analysis

    • kaggle.com
    Updated Jul 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaurya Nandecha (2023). Financial Market News Sentiment Analysis [Dataset]. https://www.kaggle.com/shauryanandecha/financial-market-news-sentiment-analysis/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shaurya Nandecha
    Description

    Dataset

    This dataset was created by Shaurya Nandecha

    Contents

  5. Stock Sentiment Analysis of News Headlines

    • kaggle.com
    zip
    Updated Dec 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayushi Duggad (2023). Stock Sentiment Analysis of News Headlines [Dataset]. https://www.kaggle.com/datasets/ayushiduggad/stock-sentiment-analysis-of-news-headlines/code
    Explore at:
    zip(3302823 bytes)Available download formats
    Dataset updated
    Dec 9, 2023
    Authors
    Ayushi Duggad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Ayushi Duggad

    Released under CC0: Public Domain

    Contents

  6. SEN - Sentiment analysis of Entities in News headlines

    • zenodo.org
    • data.niaid.nih.gov
    Updated Oct 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katarzyna Baraniak; Katarzyna Baraniak; Marcin Sydow; Marcin Sydow (2023). SEN - Sentiment analysis of Entities in News headlines [Dataset]. http://doi.org/10.1016/j.procs.2021.09.136
    Explore at:
    Dataset updated
    Oct 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katarzyna Baraniak; Katarzyna Baraniak; Marcin Sydow; Marcin Sydow
    Description

    If you wish to use this data please cite:

    Katarzyna Baraniak, Marcin Sydow,
    A dataset for Sentiment analysis of Entities in News headlines (SEN),
    Procedia Computer Science,
    Volume 192,
    2021,
    Pages 3627-3636,
    ISSN 1877-0509,
    https://doi.org/10.1016/j.procs.2021.09.136.
    (https://www.sciencedirect.com/science/article/pii/S1877050921018755)

    bibtex: users.pja.edu.pl/~msyd/bibtex/sydow-baraniak-SENdataset-kes21.bib

    SEN is a novel publicly available human-labelled dataset for training and testing machine learning algorithms for the problem of entity level sentiment analysis of political news headlines.

    On-line news portals play a very important role in the information society. Fair media should present reliable and objective information. In practice there is an observable positive or negative bias concerning named entities (e.g. politicians) mentioned in the on-line news headlines.
    Our dataset consists of 3819 human-labelled political news headlines coming from several major on-line media outlets in English and Polish.

    Each record contains a news headline, a named entity mentioned in the headline and a human annotated label (one of “positive”, “neutral”, “negative” ). Our SEN dataset package consists of 2 parts: SEN-en (English headlines that split into SEN-en-R and SEN-en-AMT), and SEN-pl (Polish headlines). Each headline-entity pair was annotated via team of volunteer researchers (the whole SEN-pl dataset and a subset of 1271 English records: the SEN-en-R subset, “R” for “researchers”) or via the Amazon Mechanical Turk service (a subset of 1360 English records: the SEN-en-AMT subset).

    During analysis of annotation outlying annotations and removed . Separate version of dataset without outliers is marked by "noutliers" in data file name.

    Details of the process of preparing the dataset and presenting its analysis are presented in the paper.

    In case of any questions, please contact one of the authors. Email adresses are in the paper.

  7. Stock Market Sentiment Data: Historical tick-by-tick sentiment data,...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    InfoTrie, Stock Market Sentiment Data: Historical tick-by-tick sentiment data, real-time updates, and market indices globally [Dataset]. https://datarade.ai/data-products/stock-market-sentiment-data-historical-tick-by-tick-sentimen-infotrie
    Explore at:
    .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset provided by
    InfoTrie Financial Solutions
    Authors
    InfoTrie
    Area covered
    South Sudan, Brazil, Sao Tome and Principe, Sierra Leone, Qatar, Libya, Mongolia, United Republic of, Monaco, Lesotho
    Description

    Gain data-driven insights for informed investment decisions. Access market sentiment data since 2013 and customize the API for seamless integration. Maximize your stock market understanding with comprehensive analytics on global stock indices, and public and private companies. Analyze sentiment trends and investor behavior with confidence.

    Sample Dataset - Historical News Sentiment data for your reference.

    Key Features:

    1. Tick-by-Tick Sentiment: Access detailed market dynamics with tick-by-tick sentiment data.
    2. Custom API: Request a customizable API covering over 70,000 tickers, including major FX, commodities, topics, and people.
    3. Proven Expertise: Trust our track record since 2013 for historical data on long-term sentiment patterns.
    4. Uncover Hidden Insights: Gauge investor sentiment and reveal market opportunities with the custom API.
    5. Real-Time Benchmarks: Enhance your strategies with real-time sentiment indicators.
    6. Customizable and Flexible Delivery: Tailor the dataset to your requirements and integrate seamlessly into your workflows.

    Gain a competitive edge with InfoTrie's Historical Tick-by-Tick Stock Market Sentiment Data. Request access now to elevate your investment strategies and make data-driven decisions.

    More information on : https://infotrie.com/sentiment-analysis/

  8. BBC datasets for sentiment analysis

    • kaggle.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Turner (2024). BBC datasets for sentiment analysis [Dataset]. https://www.kaggle.com/datasets/amunsentom/article-dataset-2/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alan Turner
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Name: BBC Articles Sentiment Analysis Dataset

    Source: BBC News

    Description: This dataset consists of articles from the BBC News website, containing a diverse range of topics such as business, politics, entertainment, technology, sports, and more. The dataset includes articles from various time periods and categories, along with labels representing the sentiment of the article. The sentiment labels indicate whether the tone of the article is positive, negative, or neutral, making it suitable for sentiment analysis tasks.

    Number of Instances: [Specify the number of articles in the dataset, for example, 2,225 articles]

    Number of Features: 1. Article Text: The content of the article (string). 2. Sentiment Label: The sentiment classification of the article. The possible labels are: - Positive - Negative - Neutral

    Data Fields: - id: Unique identifier for each article. - category: The category or topic of the article (e.g., business, politics, sports). - title: The title of the article. - content: The full text of the article. - sentiment: The sentiment label (positive, negative, or neutral).

    Example: | id | category | title | content | sentiment | |----|-----------|---------------------------|-------------------------------------------------------------------------|-----------| | 1 | Business | "Stock Market Surge" | "The stock market has surged to new highs, driven by strong earnings..." | Positive | | 2 | Politics | "Election Results" | "The election results were a mixed bag, with some surprises along the way." | Neutral | | 3 | Sports | "Team Wins Championship" | "The team won the championship after a thrilling final match." | Positive | | 4 | Technology | "New Smartphone Release" | "The new smartphone release has received mixed reactions from users." | Negative |

    Preprocessing Notes: - The text has been preprocessed to remove special characters and any HTML tags that might have been included in the original articles. - Tokenization or further text cleaning (e.g., lowercasing, stopword removal) may be necessary depending on the model and method used for sentiment classification.

    Use Case: This dataset is ideal for training and evaluating machine learning models for sentiment classification, where the goal is to predict the sentiment (positive, negative, or neutral) based on the article's text.

  9. d

    Grepsr | Sentiment Analysis of Facebook/Twitter/Instagram posts, News,...

    • datarade.ai
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grepsr (2024). Grepsr | Sentiment Analysis of Facebook/Twitter/Instagram posts, News, Product Reviews | Custom and On-demand Sentiment Analysis [Dataset]. https://datarade.ai/data-categories/news-data/apis
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset authored and provided by
    Grepsr
    Area covered
    Korea (Democratic People's Republic of), Guadeloupe, Mongolia, Åland Islands, Bosnia and Herzegovina, Taiwan, Congo (Democratic Republic of the), Comoros, Hungary, Saint Martin (French part)
    Description

    Usecase/Applications possible with the data:

    Customer feedback analysis: Analyzing customer feedback can be helpful for businesses to keep customers happy, stay loyal to the brand, and identify any areas to improve.

    Social media monitoring: With sentiment analysis, companies can monitor what's being said about them on social media and use that to figure out how people feel about their products and services and track any new trends.

    Market research: Sentiment analysis can be used to analyze market trends and consumer preferences, which can help companies make informed business decisions and develop effective marketing strategies.

    Financial analysis: You can use sentiment analysis to determine what people say about the stock market through news and social media, which can help you make investing decisions.

    For e-commerce (amazon/Bestbuy/home depot and much more) following data fields can be included: Title Price Vendor Name Ratings Reviews Brand ASIN URL Sentiment analysis for each review And other fields, as per request

  10. n

    Data Set For Sentiment Analysis On Bengali News Comments

    • narcis.nl
    • data.mendeley.com
    Updated Sep 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chowdhury, M (via Mendeley Data) (2019). Data Set For Sentiment Analysis On Bengali News Comments [Dataset]. http://doi.org/10.17632/n53xt69gnf.2
    Explore at:
    Dataset updated
    Sep 15, 2019
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Chowdhury, M (via Mendeley Data)
    Description

    This is a data set of Sentiment Analysis On Bangla News Comments where every data was annotated by three different individuals to get three different perspectives and based on the majorities decisions the final tag was chosen. This data set contains 13802 data in total.

  11. News Title Sentiment Dataset

    • zenodo.org
    bin
    Updated Mar 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb (2021). News Title Sentiment Dataset [Dataset]. http://doi.org/10.5281/zenodo.3902726
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the Monash, UEA & UCR time series regression repository. http://tseregression.org/

    The goal of this dataset is to predict sentiment score for news title. This dataset contains 83164 time series obtained from the News Popularity in Multiple Social Media Platforms dataset from the UCI repository. This is a large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn. The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine. This data set is tailored for evaluative comparisons in predictive analytics tasks, although allowing for tasks in other research areas such as topic detection and tracking, sentiment analysis in short text, first story detection or news recommendation. The time series has 3 dimensions.

    Please refer to https://archive.ics.uci.edu/ml/datasets/News+Popularity+in+Multiple+Social+Media+Platforms for more details

    Citation request
    Nuno Moniz and Luis Torgo (2018), Multi-Source Social Feedback of Online News Feeds, CoRR

  12. Sentiment Analysis outputs based on the combination of three classifiers for...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caio Mello; Caio Mello; Gullal S. Cheema; Gullal S. Cheema (2022). Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text [Dataset]. http://doi.org/10.5281/zenodo.6326348
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Caio Mello; Caio Mello; Gullal S. Cheema; Gullal S. Cheema
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text covering the Olympic legacy of Rio 2016 and London 2012. Data was searched via Google search engine. It is composed of sentiment labels assigned to 1271 news articles in total.

    News outlets:

    • BBC
    • Daily Mail
    • The Telegraph
    • The Guardian
    • Globo
    • Estadao
    • Folha de S. Paulo

    Events covered by the articles:

    • London 2012 Olympic legacy
    • Rio 2016 Olympic legacy

    All classifiers were used in texts in English. Text originally published in Portuguese by the Brazilian media were automatically translated.

    Sentiment classifiers used:

    • Vader
    • BERT (Trained on Amazon data)
    • BERT (Trained on twitter data - 140)

    Each document (spreadsheet - xlsx) refers to one outlet and one event (London 2012 or Rio 2016).

    How were labels assigned to the texts?

    These labels are a combination of the three sentiment classifiers listed above. If two of them agree with the same label, then this label would be considered as right. Otherwise, the label ‘other’ was assigned.

    For news article body text: the proportion of sentences of each sentiment type was used to assign labels to the whole article instead of averaging the sentence scores. For example, if the proportion of sentences with negative labels is greater than 50%, then the article is assigned a negative label.

    The documents are composed of the following columns:

    • Rank: the position of the article on Google search ranking
    • Date: date of article's publication (DD/MM/YYYY)
    • Link: article's link
    • Title: article's title
    • Sentiment_Title: final sentiment for article headline
    • Sentiment_Text: final sentiment for article's body text

    PS: Documents do not include articles' body text.

    Sentiment is presented in labels as follows:

    • Pos: Positive
    • Neg: Negative
    • Neutral: Neutral
    • other: inconclusive - if each of the 3 classifiers assigned a different label to the article, the label 'other' was used. Therefore, 'other' identifies contradictory results.

  13. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Nafiz Sadman
    Kishor Datta Gupta
    Nishat Anjum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  14. h

    sentiment_analys_3_combine_ds

    • huggingface.co
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaushi Gihan (2025). sentiment_analys_3_combine_ds [Dataset]. https://huggingface.co/datasets/KaushiGihan/sentiment_analys_3_combine_ds
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2025
    Authors
    Kaushi Gihan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The 3 datasets used for fine-tuning is available on Kaggle. You can download it using below links:

    IMDB dataset (Sentiment analysis) in CSV format link

    Sentiment Analysis Dataset link

    Stock News Sentiment Analysis(Massive Dataset) link

      📊 Data Preprocessing & Visualizationlink
    

    The dataset is cleaned, preprocessed, and visualized using Pandas, Matplotlib, and Seaborn. Open and run the notebook:

  15. f

    Most informative features extracted from the training dataset.

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caroline Diorio; Michael Afanasiev; Kristen Salena; Stacey Marjerrison (2023). Most informative features extracted from the training dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0209738.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Caroline Diorio; Michael Afanasiev; Kristen Salena; Stacey Marjerrison
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most informative features extracted from the training dataset.

  16. InfoTrie's News Data – real-time & daily news data via Data/API on 100M+...

    • datarade.ai
    .json, .csv, .xls
    Updated Dec 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    InfoTrie (2020). InfoTrie's News Data – real-time & daily news data via Data/API on 100M+ articles, 100K news websites [Dataset]. https://datarade.ai/data-products/finsents-financial-news-and-sentiment-screener-infotrie
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    InfoTrie Financial Solutions
    Authors
    InfoTrie
    Area covered
    Suriname, Slovenia, El Salvador, Colombia, Bangladesh, Bosnia and Herzegovina, Maldives, France, Cayman Islands, Cocos (Keeling) Islands
    Description

    Access a treasure trove of over 100 million news articles, covering data on more than 200K companies. In today's fast-paced information landscape, news data plays a pivotal role, often influencing market movements as much as fundamentals covering 1000s of topics and standard and bespoke taxonomies

    Our News Data indexes are similar to renowned search engines, and aggregate data from millions of sources, including websites, blogs, analyst reports, and business news publications. Data integration is seamless through low-latency push APIs.

    Key Features: 1. Extensive Coverage: Access over 100 million news articles. 2. Coverage: Explore data on more than 200K+ companies. 3. Customize and create business taxonomies specified to business needs from 1000s of topics. 4. Real-time Data and Analysis: Stay abreast of the latest developments. 5. Seamless & Scalable News API Integration.

    Subscribe to InfoTrie's News Data Today

    More information about the data solution on: https://infotrie.com/finsents-stock-and-sentiment-screener/

  17. CryptoSentiment: A large scale sentiment dataset for cryptocurrencies -...

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). CryptoSentiment: A large scale sentiment dataset for cryptocurrencies - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/cryptosentiment-a-large-scale-sentiment-dataset-for-cryptocurrencies
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    CryptoDATA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CryptoSentiment is a dataset, which contains sentiment information about cryptocurrency assets, gathered by various online sources, and analyzed by FinBERT sentiment extractor. More specifically, we provide a publicly available dataset containing fine-grained sentiment analysis data (minute-basis) about cryptocurrency market collected by different online sources. CryptoSentiment dataset includes 235,907 sentiment scores for 14 different cryptocurrencies gathered from various online sources such as news articles and social media.

  18. c

    Data from: EMBEDDIA tools output example corpus of Estonian, Croatian and...

    • clarin.si
    • live.european-language-grid.eu
    Updated Feb 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Freienthal; Andraž Pelicon; Matej Martinc; Blaž Škrlj; Ivar Krustok; Marko Pranjić; Luis Adrián Cabrera-Diego; Matthew Purver; Senja Pollak; Hele-Andra Kuulmets; Ravi Shekhar; Boshko Koloski (2022). EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news articles 1.0 [Dataset]. https://clarin.si/repository/xmlui/handle/11356/1485?locale-attribute=sl
    Explore at:
    Dataset updated
    Feb 10, 2022
    Authors
    Linda Freienthal; Andraž Pelicon; Matej Martinc; Blaž Škrlj; Ivar Krustok; Marko Pranjić; Luis Adrián Cabrera-Diego; Matthew Purver; Senja Pollak; Hele-Andra Kuulmets; Ravi Shekhar; Boshko Koloski
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1408 - 5,000 Croatian articles from autumn of 2010 with tags given by 24sata. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1410 - 15,264 Latvian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1409

    All the articles in the dataset have been analysed with texta-mlp Python package (https://pypi.org/project/texta-mlp/) via the EMBEDDIA Media assistant's Texta Toolkit (https://docs.texta.ee/). The tools used to analyse the articles were the following:

    • Latin1 and Latin2 Name Entity Recognition Tool modules (Cabrera-Diego et al., 2021, both described in https://aclanthology.org/2021.bsnlp-1.12/) . The Latin 1 results can be found folders annotated_articles_ner_latin1/ and annotated_articles_all_tools/, while the Latin 2 results are in annotated_articles_nerlatin2/ or annotated_articles_all_tools/.

    • RAKUN keyword extractor. RAKUN (Škrlj et al. 2019) is an unsupervised system for keyword extraction, so it can be used for any language. It detects keywords by turning text into a graph and the most important nodes in the graph mostly turn out to be the keywords. It is described in https://link.springer.com/chapter/10.1007/978-3-030-31372-2_26. The keyword annotation results can be found in the folder annotated_articles_rakun/ or annotated_articles_all_tools/.

    • TNT-KID keyword extractor. TNT-KID (Martinc et al. 2021, ) is a supervised system for automatic keyword extraction. It was trained on a corpus of articles with human-assigned keywords. For Croatian, the annotators were 24sata editors, for Estonian the Ekspress Meedia staff and for Latvian the Latvian Delfi staff. The system is further documented at https://doi.org/10.1017/S1351324921000127. For Croatian only TNT-KID was applied, while for Estonian and Latvian, the TNT-KID with TF-IDF, and extension by Koloski et al. (https://aclanthology.org/2021.hackashop-1.4.pdf) was used. The results of applying this tool are found in the folder annotated articles tnt_kid/ or annotated articles all tools/.

    • Sentiment analysis. Our news sentiment analyser (Pelicon et al. 2020) labels a news article as being of positive, negative, or neutral sentiment, using a fine-tuned multilingual BERT model, which was trained on Slovene sentiment annotated news articles. The system is further documented in https://doi.org/10.3390/app10175993. The results of this tools are found in the folder annotated articles sentiment/ or annotated articles all tools/.

    All the data is encoded in "JSON Lines" format. Each folder has its own README file which explains the structure of the files.

  19. h

    auditor_sentiment

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Finance Inc., auditor_sentiment [Dataset]. https://huggingface.co/datasets/FinanceInc/auditor_sentiment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Finance Inc.
    Description

    Dataset Card for Auditor Sentiment

      Dataset Description
    

    Auditor review sentiment collected by News Department

    Point of Contact: Talked to COE for Auditing, currently sue@demo.org

      Dataset Summary
    

    Auditor sentiment dataset of sentences from financial news. The dataset consists of several thousand sentences from English language financial news categorized by sentiment.

      Supported Tasks and Leaderboards
    

    Sentiment Classification… See the full description on the dataset page: https://huggingface.co/datasets/FinanceInc/auditor_sentiment.

  20. d

    Development of public dynamic spatio-temporal monitoring and analysis tool...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Jul 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan C. López; Miguel Jaller (2024). Development of public dynamic spatio-temporal monitoring and analysis tool of supply chain vulnerability, resilience, and sustainability [Dataset]. http://doi.org/10.5061/dryad.qjq2bvqqj
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 13, 2024
    Dataset provided by
    Dryad
    Authors
    Juan C. López; Miguel Jaller
    Description

    News indicators on Supply Chain Vulnerability, Resilience, and Sustainability

    https://doi.org/10.5061/dryad.qjq2bvqqj

    This dataset presents the key features extracted from supply-chain-related news articles. The news articles are gathered based on the following query: (USA or United States) and (supply chain or supply-chain) and (disruption or resilience) and (retailer or warehouse or transportation or factory). The features are extracted using Natural Language Processing (NLP) techniques and include:

    1. Term frequencies and Term Frequency-Inverse Document Frequency (TF-IDF). Term frequencies and Term Frequency-Inverse Document Frequency (TF-IDF) are calculated at unigram and bigram levels. TF-IDF is a widely used metric for measuring the relationship and relevance of words in documents, where tokens with higher TF-IDF values are considered more representative.
    2. Topic share. News articles are classified into eight topics relevant to supp...
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali (2023). Forex News Annotated Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.7976208
Organization logo

Forex News Annotated Dataset for Sentiment Analysis

Explore at:
csvAvailable download formats
Dataset updated
Nov 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.

To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.

We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.

Examples of Annotated Headlines


    Forex Pair
    Headline
    Sentiment
    Explanation




    GBPUSD 
    Diminishing bets for a move to 12400 
    Neutral
    Lack of strong sentiment in either direction


    GBPUSD 
    No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft 
    Positive
    Positive sentiment towards GBPUSD (Cable) in the near term


    GBPUSD 
    When are the UK jobs and how could they affect GBPUSD 
    Neutral
    Poses a question and does not express a clear sentiment


    JPYUSD
    Appropriate to continue monetary easing to achieve 2% inflation target with wage growth 
    Positive
    Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply


    USDJPY
    Dollar rebounds despite US data. Yen gains amid lower yields 
    Neutral
    Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other


    USDJPY
    USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains 
    Negative
    USDJPY is expected to reach a lower value, with the USD losing value against the JPY


    AUDUSD

    <p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>

    Positive
    Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.

Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.

Search
Clear search
Close search
Google apps
Main menu