CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset features financial news headlines collected from leading financial news websites, including CNBC, The Guardian, and Reuters. It provides an overview of the U.S. economy and stock market, primarily reflecting daily market sentiment over several years. The main purpose of this dataset is to facilitate Natural Language Processing (NLP) analyses to explore the correlation between the positivity or negativity of news sentiment and U.S. stock market performance, such as gains and losses. It is ideal for data scientists and analysts keen on understanding market dynamics through textual data.
The dataset typically includes the following columns, though availability may vary slightly by source: * Headlines: The main title or headline of the financial article. * Time: The last updated date and time of the article. * Description: A preview or summary text of the article's content.
The data files are generally provided in CSV format. Specific numbers for rows or records are not available within the provided sources, but the dataset is structured to allow for easy processing and analysis.
This dataset is well-suited for a variety of applications, including: * Sentiment analysis of financial news to predict market movements. * Developing and testing Natural Language Processing (NLP) models. * Data science and analytics projects focused on economic trends and stock market performance. * Research into the impact of media on financial markets.
The dataset covers news related to the U.S. economy and stock market. * Time Range: * CNBC and The Guardian data spans from late December 2017 to 19th July 2020. * Reuters data covers from late March 2018 to 19th July 2020. * Collectively, the headlines reflect an overview of the U.S. economy and stock market for approximately one to two years from their scraping date.
CCO
This dataset is intended for a range of users, including: * Data Scientists and Analysts performing market sentiment analysis. * Researchers studying economic indicators and financial news impact. * Individuals interested in Natural Language Processing (NLP) and text analysis applications in finance. * Anyone looking to gain insights into the relationship between news sentiment and stock market performance.
Original Data Source: Financial News Headlines Data
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:
Context:
Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.
Inspiration:
The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.
Dataset Information:
The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:
Use Cases:
Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
This dataset provides fine-grained financial sentiment analysis on news headlines. It is a human-annotated collection of over 10,700 news headlines, specifically designed to address the challenging task of sentiment extraction in financial news where multiple entities may be present, often with conflicting sentiments. Over 2,800 headlines within the dataset feature multiple entities. The dataset is balanced, containing more than 4,100 positive entities, 3,200 negative entities, and 4,500 neutral entities. It is ideal for furthering research in entity-aware sentiment analysis and can be used for training models for extracting financial named entities.
The dataset is provided as a data file, typically in CSV format. It contains over 10,700 distinct news headlines. The structure includes human-annotated sentiment labels for financial entities, with sentiments balanced across positive, negative, and neutral categories. The dataset includes headlines with varying word counts, and its sentiment annotations are quite diverse, representing 10,686 unique values for entity sentiments.
This dataset is well-suited for several applications: * Performing Aspect-based Sentiment Analysis on financial texts. * Training machine learning models for the extraction of named financial entities. * Conducting research into fine-grained financial sentiment analysis. * Validating the effect of news sentiments on aggregate market movements. * Developing and evaluating learning schemes, including those utilising lexicon-based and pre-trained sentence representations, and various classification approaches.
The dataset has a global regional coverage. While the listing date is 08/06/2025, the underlying research for SEntFiN 1.0 was published in 2022. The news headlines themselves do not specify a fixed time range within the provided information, focusing instead on the content and its annotations. No specific demographic scope is outlined, but the content is inherently focused on financial markets and related entities.
CC-BY
This dataset is beneficial for a range of users and purposes: * Researchers focusing on financial Natural Language Processing (NLP) and sentiment analysis. * Data Scientists and Analysts working on text classification, entity extraction, and market behaviour prediction. * Anyone involved in Business or Finance seeking to understand or model sentiment from news. * Developers of AI and Machine Learning models requiring human-annotated financial text data.
Original Data Source: Aspect based Sentiment Analysis for Financial News
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the urgent need for high-quality data to train sophisticated AI models capable of handling complex tasks. Key application areas, such as autonomous vehicles in the automotive industry, advanced medical diagnosis in healthcare, and personalized experiences in retail and e-commerce, are significantly contributing to this market's upward trajectory. The prevalence of text, image/video, and audio data types further diversifies the market, offering opportunities for specialized dataset providers. While the market faces challenges like data privacy concerns and the high cost of data annotation, the overall trajectory remains positive, with a projected Compound Annual Growth Rate (CAGR) exceeding 20% for the forecast period (2025-2033). This growth is further supported by advancements in deep learning techniques that demand increasingly larger and more diverse datasets for optimal performance. Leading companies like Google, Amazon, and Microsoft are actively investing in this space, expanding their dataset offerings and fostering competition within the market. Furthermore, the emergence of specialized data annotation providers caters to the specific needs of various industries, ensuring accurate and reliable data for AI model development. The geographic distribution of the market reveals strong presence in North America and Europe, driven by early adoption of AI technologies and the presence of major technology players. However, Asia Pacific is projected to witness significant growth in the coming years, propelled by increasing digitalization and a burgeoning AI ecosystem in countries like China and India. Government initiatives promoting AI development in various regions are also expected to stimulate demand for high-quality training datasets. While challenges related to data security and ethical considerations remain, the long-term outlook for the AI training dataset market is exceptionally promising, fueled by the continued evolution of artificial intelligence and its increasing integration into various aspects of modern life. The market segmentation by application and data type allows for granular analysis and targeted investments for businesses operating in this rapidly expanding sector.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains 862,231 labeled tweets and associated stock returns, providing a comprehensive look into the impact of social media on company-level stock market performance. For each tweet, researchers have extracted data such as the date of the tweet and its associated stock symbol, along with metrics such as last price and various returns (1-day return, 2-day return, 3-day return, 7-day return). Also recorded are volatility scores for both 10 day intervals and 30 day intervals. Finally, sentiment scores from both Long Short - Term Memory (LSTM) and TextBlob models have been included to quantify the overall tone in which these messages were delivered. With this dataset you will be able to explore how tweets can affect a company's share prices both short term and long term by leveraging all of these data points for analysis!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In order to use this dataset, users can utilize descriptive statistics such as histograms or regression techniques to establish relationships between tweet content & sentiment with corresponding stock return data points such as 1-day & 7-day returns measurements.
The primary fields used for analysis include Tweet Text (TWEET), Stock symbol (STOCK), Date (DATE), Closing Price at the time of Tweet (LAST_PRICE) a range of Volatility measures 10 day Volatility(VOLATILITY_10D)and 30 day Volatility(VOLATILITY_30D ) for each Stock which capture changes in market fluctuation during different periods around when Twitter reactions occur. Additionally Sentiment Polarity analysis undertaken via two Machine learning algorithms LSTM Polarity(LSTM_POLARITY)and Textblob polarity provide insight into whether people are expressing positive or negative sentiments about each company at given times which again could influence thereby potentially influence Stock Prices over shorter term periods like 1-Day Returns(1_DAY_RETURN),2-Day Returns(2_DAY_RETURN)or longer term horizon like 7 Day Returns*7DAY RETURNS*.Finally MENTION field indicates if names/acronyms associated with Companies were specifically mentioned in each Tweet or not which gives extra insight into whether company specific contexts were present within individual Tweets aka “Company Relevancy”
- Analyzing the degree to which tweets can influence stock prices. By analyzing relationships between variables such as tweet sentiment and stock returns, correlations can be identified that could be used to inform investment decisions.
- Exploring natural language processing (NLP) models for predicting future market trends based on textual data such as tweets. Through testing and evaluating different text-based models using this dataset, better predictive models may emerge that can give investors advance warning of upcoming market shifts due to news or other events.
- Investigating the impact of different types of tweets (positive/negative, factual/opinionated) on stock prices over specific time frames. By studying correlations between the sentiment or nature of a tweet and its effect on stocks, insights may be gained into what sort of news or events have a greater impact on markets in general
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: reduced_dataset-release.csv | Column name | Description | |:----------------------|:-------------------------------------------------------------------------------------------------------| | TWEET | Text of the tweet. (String) | | STOCK | Company's stock mentioned in the tweet. (String) | | DATE | Date the tweet was posted. (Date) | | LAST_PRICE | Company's last price at the time of tweeting. (Float) ...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is designed to advance labelled financial sentiment analysis research. It combines two notable datasets, FiQA and Financial PhraseBank, into a single, easy-to-use CSV file. The primary purpose is to provide financial sentences accompanied by their corresponding sentiment labels, which can be positive, negative, or neutral. This resource is valuable for understanding market and corporate sentiment expressed in textual data.
The dataset is structured with at least two key columns: * Sentence: This column contains the textual financial statement or phrase. * Sentiment Label: This column provides the associated sentiment of the sentence, categorised as 'positive', 'negative', or 'neutral'.
The dataset is provided in a CSV file format. It organises financial sentences with their assigned sentiment labels. Specific details regarding the exact number of rows or records are not available in the provided information.
This dataset is ideal for various applications and use cases, including: * Developing and testing Natural Language Processing (NLP) models for sentiment detection in financial texts. * Conducting data science and analytics projects focused on market dynamics and corporate communications. * Building tools for business intelligence to gauge sentiment from financial news and reports. * Academic research into the nuances of economic language and its emotional tone.
The dataset's regional scope is global. The financial sentences included refer to various companies and market events, with examples from periods such as 2008 and 2010. While a precise time range for all data points is not specified, the content is relevant to corporate financial and market sentiment over several years. There are no specific notes on demographic scope; the focus is on business and financial entities.
CCO
This dataset is particularly suited for: * Researchers keen on exploring financial sentiment analysis techniques and models. * Data Scientists working on machine learning applications for textual data in the finance domain. * Financial Analysts looking to integrate sentiment indicators into their market assessments. * Developers creating applications that require understanding the emotional tone of financial statements.
Original Data Source:Financial Sentiment Analysis
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains text from Federal Reserve FOMC (Federal Open Market Committee) meeting minutes and statements. It was collected by scraping the Federal Reserve's official website. The dataset provides insights into the central bank's monetary policy decisions and discussions, spanning a specific period of time.
The dataset is typically provided in CSV file format. While specific total row or record counts are not detailed, the data contains a substantial volume of text from numerous documents across various dates.
This dataset is ideal for various analytical and research purposes, including: * Analysing the sentiment and tone of FOMC meeting minutes and statements over time. * Identifying key phrases and words that signify shifts in monetary policy. * Developing natural language processing (NLP) models to forecast future policy decisions based on historical data. * Investigating the relationship between FOMC meeting minutes/statements and financial market reactions.
The dataset primarily covers the activities and discussions of the Federal Reserve's Federal Open Market Committee, making its scope relevant to US economic and financial policy. The data spans a specific, extended period of time, though precise start and end dates are not explicitly detailed in the general description. The listing indicates a global region for the dataset's availability.
CCO
This dataset is particularly valuable for: * Economists and financial analysts studying central bank policy and its impact. * Data scientists and NLP practitioners looking to build models based on financial text data. * Academic researchers investigating monetary policy, economic trends, and financial markets. * Journalists reporting on central banking and economic policy.
Original Data Source: Federal Reserve FOMC Minutes & Statements Dataset
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the need for high-quality, labeled data to train sophisticated AI models capable of handling complex tasks. Applications span various industries, including IT, automotive, healthcare, BFSI (Banking, Financial Services, and Insurance), and retail & e-commerce. The demand for diverse data types—text, image/video, and audio—further fuels market expansion. While precise market sizing is unavailable, considering the rapid growth of AI and the significant investment in data annotation services, a reasonable estimate places the 2025 market value at approximately $15 billion, with a compound annual growth rate (CAGR) of 25% projected through 2033. This growth reflects a rising awareness of the pivotal role high-quality datasets play in achieving accurate and reliable AI outcomes. Key restraining factors include the high cost of data acquisition and annotation, along with concerns around data privacy and security. However, these challenges are being addressed through advancements in automation and the emergence of innovative data synthesis techniques. The competitive landscape is characterized by a mix of established technology giants like Google, Amazon, and Microsoft, alongside specialized data annotation companies like Appen and Lionbridge. The market is expected to see continued consolidation as larger players acquire smaller firms to expand their data offerings and strengthen their market position. Regional variations exist, with North America and Europe currently dominating the market share, although regions like Asia-Pacific are projected to experience significant growth due to increasing AI adoption and investments.
This dataset provides a substantial collection of news sentences paired with their corresponding sentiment, primarily intended for financial analysis and stock prediction. With over 100,000 rows, each entry indicates whether the news is positive (represented by '1') or negative/neutral (represented by '0'), offering insights into potential stock movement. A positive sentiment suggests a likely increase in stock value, while a negative or neutral sentiment indicates a likely decrease [1, 2]. It is noted that the data within this dataset is not shuffled [2].
The dataset typically comes in CSV format [4] and consists of over 100,000 rows of data [2]. It includes two primary columns: 'Sentiment' and 'Sentence' [1]. The data is presented in an unshuffled order [2]. Specific numbers for records are available for each sentiment label: 53,026 rows for sentiment '0' and 55,725 rows for sentiment '1' [3].
This dataset is ideal for news sentiment analysis and stock prediction [1]. It can be employed to train machine learning models to forecast stock market movements based on news sentiment [1, 2]. Other use cases include developing financial analytics tools, performing large-scale text analysis on financial news, and researching the correlation between media sentiment and economic indicators [2].
The dataset's regional scope is global [5]. The time range of the data is not specified in the provided information. No specific demographic scope is mentioned for the news sources or the subjects of the news.
CC-BY-NC
This dataset is particularly useful for: * Data Scientists and Machine Learning Engineers: For building and training Natural Language Processing (NLP) models to analyse sentiment in text and predict financial outcomes [2]. * Financial Analysts and Researchers: To gain insights into how news sentiment impacts stock performance and for market forecasting [1]. * Developers: To integrate sentiment analysis capabilities into financial applications or trading algorithms. * Academics: For research into financial economics, sentiment analysis, and predictive analytics.
Original Data Source: Stock News Sentiment Analysis(Massive Dataset)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.
To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.
We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.
Examples of Annotated Headlines
Forex Pair
Headline
Sentiment
Explanation
GBPUSD
Diminishing bets for a move to 12400
Neutral
Lack of strong sentiment in either direction
GBPUSD
No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft
Positive
Positive sentiment towards GBPUSD (Cable) in the near term
GBPUSD
When are the UK jobs and how could they affect GBPUSD
Neutral
Poses a question and does not express a clear sentiment
JPYUSD
Appropriate to continue monetary easing to achieve 2% inflation target with wage growth
Positive
Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
USDJPY
Dollar rebounds despite US data. Yen gains amid lower yields
Neutral
Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
USDJPY
USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains
Negative
USDJPY is expected to reach a lower value, with the USD losing value against the JPY
AUDUSD
<p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
Positive
Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset offers a collection of customer reviews and ratings for data science-related books sourced from Amazon. It provides a valuable resource for understanding customer sentiment and the overall reception of various publications within the data science domain. The collection includes 20,647 individual reviews covering 836 distinct data science books. Each entry features the raw review text and a corresponding star rating, ranging from 1 to 5.
The dataset is structured as a CSV file and comprises 20,647 reviews. These reviews relate to 836 unique books identified through "Data Science" searches on Amazon. The star ratings are distributed across several bands: * 1.00 - 1.20 stars: 1,430 reviews * 2.00 - 2.20 stars: 929 reviews * 3.00 - 3.20 stars: 1,431 reviews * 4.00 - 4.20 stars: 2,894 reviews * 4.80 - 5.00 stars: 13,963 reviews
The dataset contains unique values for stars from 1 to 5 and for 836 books.
This dataset is well-suited for a variety of analytical and developmental purposes, including: * Natural Language Processing (NLP) tasks such as sentiment analysis, text classification, and topic modelling using the review comments. * Machine Learning (ML) model training for tasks like predicting book popularity or building recommendation engines. * Business intelligence applications to gain insights into consumer preferences and market trends for data science literature. * Research into review patterns, user feedback mechanisms, and e-commerce dynamics.
The dataset has a global regional coverage. It was listed on 16/06/2025. No specific demographic or historical time range information beyond the listing date is available in the provided sources.
CCO
This dataset is particularly useful for: * Data Scientists and Machine Learning Engineers engaged in building and testing text-based analytical models. * Academic Researchers and Students focused on computational linguistics, consumer behaviour, or data science education. * Market Analysts and Business Strategists looking to understand product perception and competitive landscapes within the book industry. * Developers creating applications that require user-generated content for training or analysis.
Original Data Source: Amazon Data Science Book Reviews
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset contains information about various products sold on Flipkart. The dataset includes the product name, price, customer rating, review, and a summary of the product. The dataset is sourced from publicly available data on Flipkart's website.This dataset is in .csv file format.
This dataset include more than 75 different types of products on flipkart.com. It contains 363260 rows and 5 columns. This dataset could be useful for analyzing customer reviews and ratings for various products on Flipkart. It could be used to identify popular products, price trends, customer preferences, and other insights. You can start with basics such as cleaning the data.
This data collection started from December 2022.
Usages:
Sentiment analysis: The dataset could be used to perform sentiment analysis on customer reviews and identify positive and negative sentiments associated with various products. This could help businesses to improve their product offerings and customer service. Product analysis: The dataset could be used to analyze the popularity and customer ratings of various products on Flipkart. This could help businesses to identify popular products and understand customer preferences and trends. Market research: The dataset could be used for market research purposes to gain insights into the e-commerce market in India and identify opportunities for growth and expansion. Recommender systems: The dataset could be used to build recommender systems that provide personalized product recommendations to customers based on their preferences and purchase history. Price analysis: The dataset could be used to analyze the prices of various products on Flipkart and identify price trends and patterns. This could help businesses to optimize their pricing strategies and remain competitive in the market.
However, There are many more uses of this dataset so explore them and let to know others.
This dataset is a cleaned collection of 400 financial news articles, each expertly annotated for sentiment. It includes the full article text, synopsis, headline, and category features. The data was annotated by multiple readers possessing expertise in statistics, the Indian stock market, and data science. This particular data snippet was utilised in research titled "On Quantifying Sentiments of Financial News - Are We Doing the Right Things!". It is available as part of a free dataset library.
The dataset is typically provided in CSV format. It comprises 400 financial news articles. The 'Date_published' column contains 399 unique values, while 'Final Status' has 400 unique values. The sentiment distribution is 54% positive and 46% negative.
Ideal for academic research, particularly for studies on quantifying financial news sentiments. It is also suitable for individuals looking to practise data cleaning and pre-processing techniques. This dataset supports applications in natural language processing (NLP), deep learning, and sentiment analysis tasks.
The dataset covers a time range from 1st April 2022 to 30th June 2022. Its regional scope is global, despite some annotators having expertise in the Indian stock market.
CC-BY-SA
Original Data Source: Sentiment Analysis - Labelled Financial News Data
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset comprises property listings scraped from Property24, a leading real estate platform in Kenya. It includes details such as property price, location, type, number of bedrooms, bathrooms, size, description, and status. This dataset can be utilized for various purposes, including price prediction modeling, market trend analysis, and investment decision-making. By analyzing this data, valuable insights can be gained into the dynamics of the Nairobi real estate market.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Scrapped from twitters from 2016-01-01 to 2019-03-29, Collecting Tweets containing Bitcoin or BTC Tools used:
Twint Tweepy
Tweet in multiple Language & Talked about Bitcoin
Thanks to Alex ( https://www.kaggle.com/alaix14 ) for his dataset (https://www.kaggle.com/alaix14/bitcoin-tweets-20160101-to-20190329 ), It is just an additional dimension where Sentiment is analyzed with a price change for Bitcoin
This dataset provides finance news labelled by their sentiment, making it highly suitable for Natural Language Processing (NLP) tasks. It is a cleaned amalgamation of multiple finance news sentiment datasets. The dataset features financial news articles in a 'text' column and their corresponding sentiment labels in a 'sentiment' column, ready for analysis and model training.
sentiment
: This column contains the categorised sentiment label for each piece of financial news. It includes 3 unique sentiment values.text
: This column holds the actual financial news content or article text. It comprises 32,411 unique text entries.The data file is typically provided in a CSV format. The dataset contains 32,411 records, with each record having a unique text entry, as inferred from the unique values in the 'text' column. Specific numbers for rows/records are directly supported by the unique count of the text entries.
This dataset is ideal for various Natural Language Processing (NLP) applications, particularly for sentiment analysis on financial news. It can be effectively used for training machine learning models to classify sentiment in textual data and understanding market sentiment from news headlines.
The geographic coverage of this dataset is global. There are no specific details available regarding the time range or demographic scope of the underlying data.
CC-BY-NC.
This dataset is suitable for: * Data Scientists and Machine Learning Engineers: For developing, training, and evaluating sentiment analysis models on financial text data. * Researchers: Those studying financial market sentiment, textual analysis in economics, or advancing general NLP methodologies. * Developers: For integrating sentiment analysis capabilities into financial applications, dashboards, or trading algorithms.
Original Data Source: Finance News Sentiments
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Crypto Fear and Greed Index’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adelsondias/crypto-fear-and-greed-index on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Each day, the website https://alternative.me/crypto/fear-and-greed-index/ publishes this index based on analysis of emotions and sentiments from different sources crunched into one simple number: The Fear & Greed Index for Bitcoin and other large cryptocurrencies.
The crypto market behaviour is very emotional. People tend to get greedy when the market is rising which results in FOMO (Fear of missing out). Also, people often sell their coins in irrational reaction of seeing red numbers. With our Fear and Greed Index, we try to save you from your own emotional overreactions. There are two simple assumptions:
Therefore, we analyze the current sentiment of the Bitcoin market and crunch the numbers into a simple meter from 0 to 100. Zero means "Extreme Fear", while 100 means "Extreme Greed". See below for further information on our data sources.
We are gathering data from the five following sources. Each data point is valued the same as the day before in order to visualize a meaningful progress in sentiment change of the crypto market.
First of all, the current index is for bitcoin only (we offer separate indices for large alt coins soon), because a big part of it is the volatility of the coin price.
But let’s list all the different factors we’re including in the current index:
We’re measuring the current volatility and max. drawdowns of bitcoin and compare it with the corresponding average values of the last 30 days and 90 days. We argue that an unusual rise in volatility is a sign of a fearful market.
Also, we’re measuring the current volume and market momentum (again in comparison with the last 30/90 day average values) and put those two values together. Generally, when we see high buying volumes in a positive market on a daily basis, we conclude that the market acts overly greedy / too bullish.
While our reddit sentiment analysis is still not in the live index (we’re still experimenting some market-related key words in the text processing algorithm), our twitter analysis is running. There, we gather and count posts on various hashtags for each coin (publicly, we show only those for Bitcoin) and check how fast and how many interactions they receive in certain time frames). A unusual high interaction rate results in a grown public interest in the coin and in our eyes, corresponds to a greedy market behaviour.
Together with strawpoll.com (disclaimer: we own this site, too), quite a large public polling platform, we’re conducting weekly crypto polls and ask people how they see the market. Usually, we’re seeing 2,000 - 3,000 votes on each poll, so we do get a picture of the sentiment of a group of crypto investors. We don’t give those results too much attention, but it was quite useful in the beginning of our studies. You can see some recent results here.
The dominance of a coin resembles the market cap share of the whole crypto market. Especially for Bitcoin, we think that a rise in Bitcoin dominance is caused by a fear of (and thus a reduction of) too speculative alt-coin investments, since Bitcoin is becoming more and more the safe haven of crypto. On the other side, when Bitcoin dominance shrinks, people are getting more greedy by investing in more risky alt-coins, dreaming of their chance in next big bull run. Anyhow, analyzing the dominance for a coin other than Bitcoin, you could argue the other way round, since more interest in an alt-coin may conclude a bullish/greedy behaviour for that specific coin.
We pull Google Trends data for various Bitcoin related search queries and crunch those numbers, especially the change of search volumes as well as recommended other currently popular searches. For example, if you check Google Trends for "Bitcoin", you can’t get much information from the search volume. But currently, you can see that there is currently a +1,550% rise of the query „bitcoin price manipulation“ in the box of related search queries (as of 05/29/2018). This is clearly a sign of fear in the market, and we use that for our index.
There's a story behind every dataset and here's your opportunity to share yours.
This dataset is produced and maintained by the administrators of https://alternative.me/crypto/fear-and-greed-index/.
This published version is an unofficial copy of their data, which can be also collected using their API (e.g., GET https://api.alternative.me/fng/?limit=10&format=csv&date_format=us).
--- Original source retains full ownership of the source dataset ---
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The columns in the dataset include index, unit id, golden, unit state, trusted judgments, last judgment at, airline sentiment, airline sentiment confidence, negative reason, negative reason confidence, airline_sentiment_gold and retweet count. There is also text included for each tweet as well as tweet location and user timezone.
Using this dataset, you can get a feel for how customers of various airlines feel about their service. You can use the data to analyze trends over time or compare different airlines. Some research ideas include using airline sentiment to predict the stock market or using the negativereason data to help airlines improve their customer service
Looking at this dataset, you can get a feel for how customers of various airlines feel about their service. The data includes the airline, the tweet text, the date of the tweet, and various other information. You can use this to analyze trends over time or compare different airlines
- Using airline sentiment to predict the stock market - is there a correlation between how the public perceives an airline and how that airline's stock performs?
- Using negativereason data to help airlines improve their customer service - which negative reasons are mentioned most often? Are there certain airlines that are consistently mentioned for specific reasons?
- Use the tweet data to map out airline hot spots - where do people tend to tweet about certain airlines the most? Is there a geographic pattern to sentiment about specific airlines?
If you use this dataset in your research, please credit Social Media Data
License
License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) - You are free to: - Share - copy and redistribute the material in any medium or format for non-commercial purposes only. - Adapt - remix, transform, and build upon the material for non-commercial purposes only. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - You may not: - Use the material for commercial purposes.
File: Airline-Sentiment-2-w-AA.csv | Column name | Description | |:---------------------------|:-----------------------------------------------------------------------------| | _golden | This column is the gold standard column. (Boolean) | | _unit_state | This column is the state of the unit. (String) | | _trusted_judgments | This column is the number of trusted judgments. (Numeric) | | _last_judgment_at | This column is the timestamp of the last judgment. (String) | | airline_sentiment | This column is the sentiment of the tweet. (String) | | negativereason | This column is the negative reason for the sentiment. (String) | | airline_sentiment_gold | This column is the gold standard sentiment of the tweet. (String) | | name | This column is the name of the airline. (String) | | negativereason_gold | This column is the gold standard negative reason for the sentiment. (String) | | retweet_count | This column is the number of retweets. (Numeric) | | text | This column is the text of the tweet. (String) | | tweet_coord | This column is the coordinates of the tweet. (String) | | tweet_created | This column is the timestamp of the tweet. (String) | | tweet_location | This column is the location of the tweet. (String) | | user_timezone | This column is the timezone of the user. (String) |
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Welcome to the Women’s Clothing E-Commerce Reviews dataset, a collection of customer reviews designed for deep textual analysis. This dataset offers a rich environment for parsing review text through its multiple dimensions, making it suitable for a variety of analytical tasks. Given that this is real commercial data, all identifying company references within the review text and body have been meticulously anonymised and replaced with "retailer".
This dataset comprises 23,486 rows and 10 distinct feature variables [1]. Each row meticulously corresponds to a single customer review [1]. The data file is typically presented in CSV format [3]. A sample file is intended to be updated separately to the platform [3].
This dataset is ideally suited for tasks involving sentiment analysis, natural language processing (NLP), and classification [1]. Potential applications include developing robust product recommendation systems, identifying emerging trends in customer feedback, and conducting in-depth market research into consumer preferences and satisfaction within the e-commerce fashion sector.
The dataset is globally relevant [4], encompassing a wide range of customer reviews from an e-commerce context. While specific time ranges are not detailed, it includes demographic information such as the reviewer's age [1, 2]. The commercial nature of the data means it has been anonymised to protect privacy [1].
CC0
This dataset is highly beneficial for data scientists, machine learning engineers, and natural language processing specialists aiming to build predictive models or perform textual analysis [1]. Furthermore, market researchers and business analysts can leverage this data to gain insights into customer behaviour, product performance, and overall market sentiment within the clothing e-commerce domain.
Original Data Source: 👗 Women's E-Commerce Clothing Reviews
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Context Scraped from CNBC, the Guardian, and Reuters official websites, the headlines in these datasets reflects the overview of the U.S. economy and stock market every day for the past year to 2 years.
Content Data scraped from CNBC contains the headlines, last updated date, and the preview text of articles from the end of December 2017 to July 19th, 2020. Data scraped from the Guardian Business contains the headlines and last updated date of articles from the end of December 2017 to July 19th, 2020 since the Guardian Business does not offer preview text. Data scraped from Reuters contains the headlines, last updated date, and the preview text of articles from the end of March 2018 to July 19th, 2020. Inspiration I firmly believe that the sentiment of financial news articles reflects and directs the performance of the U.S. stock market. Therefore, by applying Natural Language Processing (NLP) through these headlines, I can see how the positivity/negativity of the score through each day correlate to the stock market's gains/losses.
The cover image was taken from https://hipwallpaper.com/stock-trader-wallpapers/
Original Data Source: Financial News Headlines Data
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset features financial news headlines collected from leading financial news websites, including CNBC, The Guardian, and Reuters. It provides an overview of the U.S. economy and stock market, primarily reflecting daily market sentiment over several years. The main purpose of this dataset is to facilitate Natural Language Processing (NLP) analyses to explore the correlation between the positivity or negativity of news sentiment and U.S. stock market performance, such as gains and losses. It is ideal for data scientists and analysts keen on understanding market dynamics through textual data.
The dataset typically includes the following columns, though availability may vary slightly by source: * Headlines: The main title or headline of the financial article. * Time: The last updated date and time of the article. * Description: A preview or summary text of the article's content.
The data files are generally provided in CSV format. Specific numbers for rows or records are not available within the provided sources, but the dataset is structured to allow for easy processing and analysis.
This dataset is well-suited for a variety of applications, including: * Sentiment analysis of financial news to predict market movements. * Developing and testing Natural Language Processing (NLP) models. * Data science and analytics projects focused on economic trends and stock market performance. * Research into the impact of media on financial markets.
The dataset covers news related to the U.S. economy and stock market. * Time Range: * CNBC and The Guardian data spans from late December 2017 to 19th July 2020. * Reuters data covers from late March 2018 to 19th July 2020. * Collectively, the headlines reflect an overview of the U.S. economy and stock market for approximately one to two years from their scraping date.
CCO
This dataset is intended for a range of users, including: * Data Scientists and Analysts performing market sentiment analysis. * Researchers studying economic indicators and financial news impact. * Individuals interested in Natural Language Processing (NLP) and text analysis applications in finance. * Anyone looking to gain insights into the relationship between news sentiment and stock market performance.
Original Data Source: Financial News Headlines Data