51 datasets found

o
Daily Market News Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Daily Market News Dataset [Dataset]. https://www.opendatabay.com/data/financial/75f5a0aa-5b18-405b-b673-0af308f23961
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset features financial news headlines collected from leading financial news websites, including CNBC, The Guardian, and Reuters. It provides an overview of the U.S. economy and stock market, primarily reflecting daily market sentiment over several years. The main purpose of this dataset is to facilitate Natural Language Processing (NLP) analyses to explore the correlation between the positivity or negativity of news sentiment and U.S. stock market performance, such as gains and losses. It is ideal for data scientists and analysts keen on understanding market dynamics through textual data.

Columns

The dataset typically includes the following columns, though availability may vary slightly by source: * Headlines: The main title or headline of the financial article. * Time: The last updated date and time of the article. * Description: A preview or summary text of the article's content.

Distribution

The data files are generally provided in CSV format. Specific numbers for rows or records are not available within the provided sources, but the dataset is structured to allow for easy processing and analysis.

Usage

This dataset is well-suited for a variety of applications, including: * Sentiment analysis of financial news to predict market movements. * Developing and testing Natural Language Processing (NLP) models. * Data science and analytics projects focused on economic trends and stock market performance. * Research into the impact of media on financial markets.

Coverage

The dataset covers news related to the U.S. economy and stock market. * Time Range: * CNBC and The Guardian data spans from late December 2017 to 19th July 2020. * Reuters data covers from late March 2018 to 19th July 2020. * Collectively, the headlines reflect an overview of the U.S. economy and stock market for approximately one to two years from their scraping date.

License

CCO

Who Can Use It

This dataset is intended for a range of users, including: * Data Scientists and Analysts performing market sentiment analysis. * Researchers studying economic indicators and financial news impact. * Individuals interested in Natural Language Processing (NLP) and text analysis applications in finance. * Anyone looking to gain insights into the relationship between news sentiment and stock market performance.

Dataset Name Suggestions

US Financial News Headlines

Stock Market Sentiment News

Financial Article Headlines

Daily Market News Dataset

Economy News Headlines for NLP

Attributes

Original Data Source: Financial News Headlines Data
Retail Transactions Dataset
kaggle.com
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasad Patil
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

Context:

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

Inspiration:

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

Dataset Information:

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.

Date: The date and time when the transaction occurred. It records the timestamp of each purchase.

Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.

Product: A list of products purchased in the transaction. It includes the names of the products bought.

Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.

Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.

Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.

City: The city where the purchase took place. It indicates the location of the transaction.

Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.

Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.

Customer_Category: A category representing the customer's background or age group.

Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.

Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

Use Cases:

Market Basket Analysis: Discover associations between products and uncover buying patterns.

Customer Segmentation: Group customers based on purchasing behavior.

Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.

Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
o
Data from: Financial News Sentiment Analysis Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Financial News Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/dataset/6c0503f5-8003-44c8-a56c-d65bec8e5d40
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
Area covered
Data Science and Analytics
Description
This dataset provides fine-grained financial sentiment analysis on news headlines. It is a human-annotated collection of over 10,700 news headlines, specifically designed to address the challenging task of sentiment extraction in financial news where multiple entities may be present, often with conflicting sentiments. Over 2,800 headlines within the dataset feature multiple entities. The dataset is balanced, containing more than 4,100 positive entities, 3,200 negative entities, and 4,500 neutral entities. It is ideal for furthering research in entity-aware sentiment analysis and can be used for training models for extracting financial named entities.

Columns

S No.: Represents the serial number for each entry.

Title: Contains the news headlines.

Decisions: Provides the sentiment annotations for various financial entities identified within the news headlines.

Words: Indicates the number of words in each news headline.

Distribution

The dataset is provided as a data file, typically in CSV format. It contains over 10,700 distinct news headlines. The structure includes human-annotated sentiment labels for financial entities, with sentiments balanced across positive, negative, and neutral categories. The dataset includes headlines with varying word counts, and its sentiment annotations are quite diverse, representing 10,686 unique values for entity sentiments.

Usage

This dataset is well-suited for several applications: * Performing Aspect-based Sentiment Analysis on financial texts. * Training machine learning models for the extraction of named financial entities. * Conducting research into fine-grained financial sentiment analysis. * Validating the effect of news sentiments on aggregate market movements. * Developing and evaluating learning schemes, including those utilising lexicon-based and pre-trained sentence representations, and various classification approaches.

Coverage

The dataset has a global regional coverage. While the listing date is 08/06/2025, the underlying research for SEntFiN 1.0 was published in 2022. The news headlines themselves do not specify a fixed time range within the provided information, focusing instead on the content and its annotations. No specific demographic scope is outlined, but the content is inherently focused on financial markets and related entities.

License

CC-BY

Who Can Use It

This dataset is beneficial for a range of users and purposes: * Researchers focusing on financial Natural Language Processing (NLP) and sentiment analysis. * Data Scientists and Analysts working on text classification, entity extraction, and market behaviour prediction. * Anyone involved in Business or Finance seeking to understand or model sentiment from news. * Developers of AI and Machine Learning models requiring human-annotated financial text data.

Dataset Name Suggestions

SEntFiN 1.0

Financial News Entity Sentiment Headlines

Aspect-Based Financial Sentiment Data

Annotated Financial News for NLP

Financial News Sentiment Analysis Dataset

Attributes

Original Data Source: Aspect based Sentiment Analysis for Financial News
A
AI Training Dataset Report
datainsightsmarket.com
doc, pdf, ppt
Updated Apr 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). AI Training Dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-dataset-1501897
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Apr 30, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the urgent need for high-quality data to train sophisticated AI models capable of handling complex tasks. Key application areas, such as autonomous vehicles in the automotive industry, advanced medical diagnosis in healthcare, and personalized experiences in retail and e-commerce, are significantly contributing to this market's upward trajectory. The prevalence of text, image/video, and audio data types further diversifies the market, offering opportunities for specialized dataset providers. While the market faces challenges like data privacy concerns and the high cost of data annotation, the overall trajectory remains positive, with a projected Compound Annual Growth Rate (CAGR) exceeding 20% for the forecast period (2025-2033). This growth is further supported by advancements in deep learning techniques that demand increasingly larger and more diverse datasets for optimal performance. Leading companies like Google, Amazon, and Microsoft are actively investing in this space, expanding their dataset offerings and fostering competition within the market. Furthermore, the emergence of specialized data annotation providers caters to the specific needs of various industries, ensuring accurate and reliable data for AI model development. The geographic distribution of the market reveals strong presence in North America and Europe, driven by early adoption of AI technologies and the presence of major technology players. However, Asia Pacific is projected to witness significant growth in the coming years, propelled by increasing digitalization and a burgeoning AI ecosystem in countries like China and India. Government initiatives promoting AI development in various regions are also expected to stimulate demand for high-quality training datasets. While challenges related to data security and ethical considerations remain, the long-term outlook for the AI training dataset market is exceptionally promising, fueled by the continued evolution of artificial intelligence and its increasing integration into various aspects of modern life. The market segmentation by application and data type allows for granular analysis and targeted investments for businesses operating in this rapidly expanding sector.
Tweet Sentiment's Impact on Stock Returns
kaggle.com
Updated Jan 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Tweet Sentiment's Impact on Stock Returns [Dataset]. https://www.kaggle.com/datasets/thedevastator/tweet-sentiment-s-impact-on-stock-returns
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Tweet Sentiment's Impact on Stock Returns

862,231 Labeled Instances

By [source]

About this dataset

This dataset contains 862,231 labeled tweets and associated stock returns, providing a comprehensive look into the impact of social media on company-level stock market performance. For each tweet, researchers have extracted data such as the date of the tweet and its associated stock symbol, along with metrics such as last price and various returns (1-day return, 2-day return, 3-day return, 7-day return). Also recorded are volatility scores for both 10 day intervals and 30 day intervals. Finally, sentiment scores from both Long Short - Term Memory (LSTM) and TextBlob models have been included to quantify the overall tone in which these messages were delivered. With this dataset you will be able to explore how tweets can affect a company's share prices both short term and long term by leveraging all of these data points for analysis!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to use this dataset, users can utilize descriptive statistics such as histograms or regression techniques to establish relationships between tweet content & sentiment with corresponding stock return data points such as 1-day & 7-day returns measurements.

The primary fields used for analysis include Tweet Text (TWEET), Stock symbol (STOCK), Date (DATE), Closing Price at the time of Tweet (LAST_PRICE) a range of Volatility measures 10 day Volatility(VOLATILITY_10D)and 30 day Volatility(VOLATILITY_30D ) for each Stock which capture changes in market fluctuation during different periods around when Twitter reactions occur. Additionally Sentiment Polarity analysis undertaken via two Machine learning algorithms LSTM Polarity(LSTM_POLARITY)and Textblob polarity provide insight into whether people are expressing positive or negative sentiments about each company at given times which again could influence thereby potentially influence Stock Prices over shorter term periods like 1-Day Returns(1_DAY_RETURN),2-Day Returns(2_DAY_RETURN)or longer term horizon like 7 Day Returns*7DAY RETURNS*.Finally MENTION field indicates if names/acronyms associated with Companies were specifically mentioned in each Tweet or not which gives extra insight into whether company specific contexts were present within individual Tweets aka “Company Relevancy”

Research Ideas

Analyzing the degree to which tweets can influence stock prices. By analyzing relationships between variables such as tweet sentiment and stock returns, correlations can be identified that could be used to inform investment decisions.

Exploring natural language processing (NLP) models for predicting future market trends based on textual data such as tweets. Through testing and evaluating different text-based models using this dataset, better predictive models may emerge that can give investors advance warning of upcoming market shifts due to news or other events.

Investigating the impact of different types of tweets (positive/negative, factual/opinionated) on stock prices over specific time frames. By studying correlations between the sentiment or nature of a tweet and its effect on stocks, insights may be gained into what sort of news or events have a greater impact on markets in general

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: reduced_dataset-release.csv | Column name | Description | |:----------------------|:-------------------------------------------------------------------------------------------------------| | TWEET | Text of the tweet. (String) | | STOCK | Company's stock mentioned in the tweet. (String) | | DATE | Date the tweet was posted. (Date) | | LAST_PRICE | Company's last price at the time of tweeting. (Float) ...
o
Labelled Market Sentiment Analysis Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Labelled Market Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/dataset/7992d0e4-378e-43a3-89f7-4d2a87f7d0f4
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset is designed to advance labelled financial sentiment analysis research. It combines two notable datasets, FiQA and Financial PhraseBank, into a single, easy-to-use CSV file. The primary purpose is to provide financial sentences accompanied by their corresponding sentiment labels, which can be positive, negative, or neutral. This resource is valuable for understanding market and corporate sentiment expressed in textual data.

Columns

The dataset is structured with at least two key columns: * Sentence: This column contains the textual financial statement or phrase. * Sentiment Label: This column provides the associated sentiment of the sentence, categorised as 'positive', 'negative', or 'neutral'.

Distribution

The dataset is provided in a CSV file format. It organises financial sentences with their assigned sentiment labels. Specific details regarding the exact number of rows or records are not available in the provided information.

Usage

This dataset is ideal for various applications and use cases, including: * Developing and testing Natural Language Processing (NLP) models for sentiment detection in financial texts. * Conducting data science and analytics projects focused on market dynamics and corporate communications. * Building tools for business intelligence to gauge sentiment from financial news and reports. * Academic research into the nuances of economic language and its emotional tone.

Coverage

The dataset's regional scope is global. The financial sentences included refer to various companies and market events, with examples from periods such as 2008 and 2010. While a precise time range for all data points is not specified, the content is relevant to corporate financial and market sentiment over several years. There are no specific notes on demographic scope; the focus is on business and financial entities.

License

CCO

Who Can Use It

This dataset is particularly suited for: * Researchers keen on exploring financial sentiment analysis techniques and models. * Data Scientists working on machine learning applications for textual data in the finance domain. * Financial Analysts looking to integrate sentiment indicators into their market assessments. * Developers creating applications that require understanding the emotional tone of financial statements.

Dataset Name Suggestions

Financial Sentence Sentiment Corpus

Global Financial Sentiment Labeled Data

Market Sentiment Analysis Dataset

Corporate Financial Text Sentiment

Attributes

Original Data Source:Financial Sentiment Analysis
o
Federal Reserve Policy Transcripts Dataset
opendatabay.com
.undefined
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Federal Reserve Policy Transcripts Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/869367f6-47a6-4d59-a97b-0d03a2962e93
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Finance & Banking Analytics
Description
This dataset contains text from Federal Reserve FOMC (Federal Open Market Committee) meeting minutes and statements. It was collected by scraping the Federal Reserve's official website. The dataset provides insights into the central bank's monetary policy decisions and discussions, spanning a specific period of time.

Columns

Date: The date of the FOMC meeting or statement release, provided in the YYYYMMDD format.

Type: An indicator for the document type. A value of '0' signifies a statement, while '1' indicates meeting minutes.

Text: The actual text content of each paragraph from the meeting minutes or statements.

Distribution

The dataset is typically provided in CSV file format. While specific total row or record counts are not detailed, the data contains a substantial volume of text from numerous documents across various dates.

Usage

This dataset is ideal for various analytical and research purposes, including: * Analysing the sentiment and tone of FOMC meeting minutes and statements over time. * Identifying key phrases and words that signify shifts in monetary policy. * Developing natural language processing (NLP) models to forecast future policy decisions based on historical data. * Investigating the relationship between FOMC meeting minutes/statements and financial market reactions.

Coverage

The dataset primarily covers the activities and discussions of the Federal Reserve's Federal Open Market Committee, making its scope relevant to US economic and financial policy. The data spans a specific, extended period of time, though precise start and end dates are not explicitly detailed in the general description. The listing indicates a global region for the dataset's availability.

License

CCO

Who Can Use It

This dataset is particularly valuable for: * Economists and financial analysts studying central bank policy and its impact. * Data scientists and NLP practitioners looking to build models based on financial text data. * Academic researchers investigating monetary policy, economic trends, and financial markets. * Journalists reporting on central banking and economic policy.

Dataset Name Suggestions

Federal Reserve FOMC Documents

US Monetary Policy Texts

FOMC Meeting & Statement Archive

Federal Reserve Policy Transcripts

Central Bank Monetary Data

Attributes

Original Data Source: Federal Reserve FOMC Minutes & Statements Dataset
a
ai training dataset Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). ai training dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-dataset-1502524
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 10, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
CA
Variables measured
Market Size
Description
The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the need for high-quality, labeled data to train sophisticated AI models capable of handling complex tasks. Applications span various industries, including IT, automotive, healthcare, BFSI (Banking, Financial Services, and Insurance), and retail & e-commerce. The demand for diverse data types—text, image/video, and audio—further fuels market expansion. While precise market sizing is unavailable, considering the rapid growth of AI and the significant investment in data annotation services, a reasonable estimate places the 2025 market value at approximately $15 billion, with a compound annual growth rate (CAGR) of 25% projected through 2033. This growth reflects a rising awareness of the pivotal role high-quality datasets play in achieving accurate and reliable AI outcomes. Key restraining factors include the high cost of data acquisition and annotation, along with concerns around data privacy and security. However, these challenges are being addressed through advancements in automation and the emergence of innovative data synthesis techniques. The competitive landscape is characterized by a mix of established technology giants like Google, Amazon, and Microsoft, alongside specialized data annotation companies like Appen and Lionbridge. The market is expected to see continued consolidation as larger players acquire smaller firms to expand their data offerings and strengthen their market position. Regional variations exist, with North America and Europe currently dominating the market share, although regions like Asia-Pacific are projected to experience significant growth due to increasing AI adoption and investments.
o
Massive Stock Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Massive Stock Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/d0828f81-ab19-4e17-9195-b32bad95268c
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
Area covered
Finance & Banking Analytics
Description
This dataset provides a substantial collection of news sentences paired with their corresponding sentiment, primarily intended for financial analysis and stock prediction. With over 100,000 rows, each entry indicates whether the news is positive (represented by '1') or negative/neutral (represented by '0'), offering insights into potential stock movement. A positive sentiment suggests a likely increase in stock value, while a negative or neutral sentiment indicates a likely decrease [1, 2]. It is noted that the data within this dataset is not shuffled [2].

Columns

Sentiment: A numerical label indicating the sentiment of the news sentence. A value of 0 denotes negative or neutral sentiment, suggesting a stock price might go down. A value of 1 denotes positive sentiment, suggesting a stock price might go up [1, 2]. There are 53,026 instances of 0 and 55,725 instances of 1, making a total of 108,301 unique values in this column [3].

Sentence: The actual text of the news article sentence [1, 2]. This column contains the textual data analysed for sentiment.

Distribution

The dataset typically comes in CSV format [4] and consists of over 100,000 rows of data [2]. It includes two primary columns: 'Sentiment' and 'Sentence' [1]. The data is presented in an unshuffled order [2]. Specific numbers for records are available for each sentiment label: 53,026 rows for sentiment '0' and 55,725 rows for sentiment '1' [3].

Usage

This dataset is ideal for news sentiment analysis and stock prediction [1]. It can be employed to train machine learning models to forecast stock market movements based on news sentiment [1, 2]. Other use cases include developing financial analytics tools, performing large-scale text analysis on financial news, and researching the correlation between media sentiment and economic indicators [2].

Coverage

The dataset's regional scope is global [5]. The time range of the data is not specified in the provided information. No specific demographic scope is mentioned for the news sources or the subjects of the news.

License

CC-BY-NC

Who Can Use It

This dataset is particularly useful for: * Data Scientists and Machine Learning Engineers: For building and training Natural Language Processing (NLP) models to analyse sentiment in text and predict financial outcomes [2]. * Financial Analysts and Researchers: To gain insights into how news sentiment impacts stock performance and for market forecasting [1]. * Developers: To integrate sentiment analysis capabilities into financial applications or trading algorithms. * Academics: For research into financial economics, sentiment analysis, and predictive analytics.

Dataset Name Suggestions

Stock News Sentiment for Market Prediction

Financial News Sentiment Analysis Dataset

Massive Stock Sentiment Data

Market News Sentiment for Stock Forecasting

Attributes

Original Data Source: Stock News Sentiment Analysis(Massive Dataset)
Forex News Annotated Dataset for Sentiment Analysis
zenodo.org
paperswithcode.com
+1more
csv
Updated Nov 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali (2023). Forex News Annotated Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.7976208
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7976208
Dataset updated
Nov 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.

To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.

We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.

Examples of Annotated Headlines Forex Pair Headline Sentiment Explanation GBPUSD Diminishing bets for a move to 12400 Neutral Lack of strong sentiment in either direction GBPUSD No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft Positive Positive sentiment towards GBPUSD (Cable) in the near term GBPUSD When are the UK jobs and how could they affect GBPUSD Neutral Poses a question and does not express a clear sentiment JPYUSD Appropriate to continue monetary easing to achieve 2% inflation target with wage growth Positive Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply USDJPY Dollar rebounds despite US data. Yen gains amid lower yields Neutral Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other USDJPY USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains Negative USDJPY is expected to reach a lower value, with the USD losing value against the JPY AUDUSD <p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p> Positive Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.

Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.
o
Data Science Book Customer Ratings Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Data Science Book Customer Ratings Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/fa468f38-c13a-4388-9e15-6e7acdc99d98
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
This dataset offers a collection of customer reviews and ratings for data science-related books sourced from Amazon. It provides a valuable resource for understanding customer sentiment and the overall reception of various publications within the data science domain. The collection includes 20,647 individual reviews covering 836 distinct data science books. Each entry features the raw review text and a corresponding star rating, ranging from 1 to 5.

Columns

stars: Represents the customer's rating for the book, indicated by a number from 1 to 5, where 5 is the highest possible score.

comment: Contains the textual content of the customer's review.

book_url: Provides the direct web address to the specific book's product page on Amazon.

Distribution

The dataset is structured as a CSV file and comprises 20,647 reviews. These reviews relate to 836 unique books identified through "Data Science" searches on Amazon. The star ratings are distributed across several bands: * 1.00 - 1.20 stars: 1,430 reviews * 2.00 - 2.20 stars: 929 reviews * 3.00 - 3.20 stars: 1,431 reviews * 4.00 - 4.20 stars: 2,894 reviews * 4.80 - 5.00 stars: 13,963 reviews

The dataset contains unique values for stars from 1 to 5 and for 836 books.

Usage

This dataset is well-suited for a variety of analytical and developmental purposes, including: * Natural Language Processing (NLP) tasks such as sentiment analysis, text classification, and topic modelling using the review comments. * Machine Learning (ML) model training for tasks like predicting book popularity or building recommendation engines. * Business intelligence applications to gain insights into consumer preferences and market trends for data science literature. * Research into review patterns, user feedback mechanisms, and e-commerce dynamics.

Coverage

The dataset has a global regional coverage. It was listed on 16/06/2025. No specific demographic or historical time range information beyond the listing date is available in the provided sources.

License

CCO

Who Can Use It

This dataset is particularly useful for: * Data Scientists and Machine Learning Engineers engaged in building and testing text-based analytical models. * Academic Researchers and Students focused on computational linguistics, consumer behaviour, or data science education. * Market Analysts and Business Strategists looking to understand product perception and competitive landscapes within the book industry. * Developers creating applications that require user-generated content for training or analysis.

Dataset Name Suggestions

Amazon Data Science Book Reviews

Data Science Book Customer Ratings Dataset

Amazon DS Book Reviews and Ratings

Data Science Book Feedback Collection

Attributes

Original Data Source: Amazon Data Science Book Reviews
Flipkart Products Review Dataset(363K Data)
kaggle.com
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirali vaghani (2023). Flipkart Products Review Dataset(363K Data) [Dataset]. http://doi.org/10.34740/kaggle/dsv/5051290
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5051290
Dataset updated
Feb 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nirali vaghani
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This dataset contains information about various products sold on Flipkart. The dataset includes the product name, price, customer rating, review, and a summary of the product. The dataset is sourced from publicly available data on Flipkart's website.This dataset is in .csv file format.

This dataset include more than 75 different types of products on flipkart.com. It contains 363260 rows and 5 columns. This dataset could be useful for analyzing customer reviews and ratings for various products on Flipkart. It could be used to identify popular products, price trends, customer preferences, and other insights. You can start with basics such as cleaning the data.

This data collection started from December 2022.

Usages:

Sentiment analysis: The dataset could be used to perform sentiment analysis on customer reviews and identify positive and negative sentiments associated with various products. This could help businesses to improve their product offerings and customer service. Product analysis: The dataset could be used to analyze the popularity and customer ratings of various products on Flipkart. This could help businesses to identify popular products and understand customer preferences and trends. Market research: The dataset could be used for market research purposes to gain insights into the e-commerce market in India and identify opportunities for growth and expansion. Recommender systems: The dataset could be used to build recommender systems that provide personalized product recommendations to customers based on their preferences and purchase history. Price analysis: The dataset could be used to analyze the prices of various products on Flipkart and identify price trends and patterns. This could help businesses to optimize their pricing strategies and remain competitive in the market.

However, There are many more uses of this dataset so explore them and let to know others.
o
Financial News Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Financial News Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/financial/e47dca0c-f4e1-4a85-b4aa-7cf7c484a402
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
Area covered
Data Science and Analytics
Description
This dataset is a cleaned collection of 400 financial news articles, each expertly annotated for sentiment. It includes the full article text, synopsis, headline, and category features. The data was annotated by multiple readers possessing expertise in statistics, the Indian stock market, and data science. This particular data snippet was utilised in research titled "On Quantifying Sentiments of Financial News - Are We Doing the Right Things!". It is available as part of a free dataset library.

Columns

Date_published: The date when the news article was published.

Headline: The headline of the news article.

Synopsis: A brief summary of the news article.

Full_text: The complete news content contained within the article.

Final Status: The sentiment assigned to each news article.

Distribution

The dataset is typically provided in CSV format. It comprises 400 financial news articles. The 'Date_published' column contains 399 unique values, while 'Final Status' has 400 unique values. The sentiment distribution is 54% positive and 46% negative.

Usage

Ideal for academic research, particularly for studies on quantifying financial news sentiments. It is also suitable for individuals looking to practise data cleaning and pre-processing techniques. This dataset supports applications in natural language processing (NLP), deep learning, and sentiment analysis tasks.

Coverage

The dataset covers a time range from 1st April 2022 to 30th June 2022. Its regional scope is global, despite some annotators having expertise in the Indian stock market.

License

CC-BY-SA

Who Can Use It

Data Scientists and Analysts: For developing and testing sentiment analysis models.

Researchers: For academic studies on financial market sentiment and NLP.

Students: For practical exercises in data cleaning, pre-processing, and text analysis.

Developers: Building applications requiring financial news sentiment.

Dataset Name Suggestions

Financial News Sentiment Dataset

Labelled Financial News Sentiment

Expert-Annotated Financial News

UK Financial Sentiment Corpus

Financial News Article Sentiment

Attributes

Original Data Source: Sentiment Analysis - Labelled Financial News Data
Nairobi House Prices Dataset
kaggle.com
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kibor Cheruiyot (2024). Nairobi House Prices Dataset [Dataset]. https://www.kaggle.com/datasets/destro7/nairobi-house-prices-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kibor Cheruiyot
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Nairobi
Description
The dataset comprises property listings scraped from Property24, a leading real estate platform in Kenya. It includes details such as property price, location, type, number of bedrooms, bathrooms, size, description, and status. This dataset can be utilized for various purposes, including price prediction modeling, market trend analysis, and investment decision-making. By analyzing this data, valuable insights can be gained into the dynamics of the Nairobi real estate market.
Bitcoin tweets - Market Sentiment
kaggle.com
Updated Aug 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2021). Bitcoin tweets - Market Sentiment [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/bitcoin-tweets-16m-tweets-with-sentiment-tagged
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaurav Dutta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Scrapped from twitters from 2016-01-01 to 2019-03-29, Collecting Tweets containing Bitcoin or BTC Tools used:

Twint Tweepy

Content

Tweet in multiple Language & Talked about Bitcoin

Acknowledgements

Thanks to Alex ( https://www.kaggle.com/alaix14 ) for his dataset (https://www.kaggle.com/alaix14/bitcoin-tweets-20160101-to-20190329 ), It is just an additional dimension where Sentiment is analyzed with a price change for Bitcoin
o
Financial News Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Financial News Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/8ab0d872-0605-4c14-b25d-82cc0d71ffc0
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 7, 2025
Dataset authored and provided by
Datasimple
Area covered
Finance & Banking Analytics
Description
This dataset provides finance news labelled by their sentiment, making it highly suitable for Natural Language Processing (NLP) tasks. It is a cleaned amalgamation of multiple finance news sentiment datasets. The dataset features financial news articles in a 'text' column and their corresponding sentiment labels in a 'sentiment' column, ready for analysis and model training.

Columns

sentiment: This column contains the categorised sentiment label for each piece of financial news. It includes 3 unique sentiment values.

text: This column holds the actual financial news content or article text. It comprises 32,411 unique text entries.

Distribution

The data file is typically provided in a CSV format. The dataset contains 32,411 records, with each record having a unique text entry, as inferred from the unique values in the 'text' column. Specific numbers for rows/records are directly supported by the unique count of the text entries.

Usage

This dataset is ideal for various Natural Language Processing (NLP) applications, particularly for sentiment analysis on financial news. It can be effectively used for training machine learning models to classify sentiment in textual data and understanding market sentiment from news headlines.

Coverage

The geographic coverage of this dataset is global. There are no specific details available regarding the time range or demographic scope of the underlying data.

License

CC-BY-NC.

Who Can Use It

This dataset is suitable for: * Data Scientists and Machine Learning Engineers: For developing, training, and evaluating sentiment analysis models on financial text data. * Researchers: Those studying financial market sentiment, textual analysis in economics, or advancing general NLP methodologies. * Developers: For integrating sentiment analysis capabilities into financial applications, dashboards, or trading algorithms.

Dataset Name Suggestions

Financial News Sentiment Dataset

Finance Sentiment Analysis Data

Global Financial News Sentiments

NLP Financial Sentiment Corpus

Market Sentiment News Data

Attributes

Original Data Source: Finance News Sentiments
A
‘Crypto Fear and Greed Index’ analyzed by Analyst-2
analyst-2.ai
Updated May 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Crypto Fear and Greed Index’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-crypto-fear-and-greed-index-e01d/latest
Explore at:
Dataset updated
May 28, 2018
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Crypto Fear and Greed Index’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adelsondias/crypto-fear-and-greed-index on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Crypto Fear and Greed Index

Each day, the website https://alternative.me/crypto/fear-and-greed-index/ publishes this index based on analysis of emotions and sentiments from different sources crunched into one simple number: The Fear & Greed Index for Bitcoin and other large cryptocurrencies.

Why Measure Fear and Greed?

The crypto market behaviour is very emotional. People tend to get greedy when the market is rising which results in FOMO (Fear of missing out). Also, people often sell their coins in irrational reaction of seeing red numbers. With our Fear and Greed Index, we try to save you from your own emotional overreactions. There are two simple assumptions:

Extreme fear can be a sign that investors are too worried. That could be a buying opportunity.

When Investors are getting too greedy, that means the market is due for a correction.

Therefore, we analyze the current sentiment of the Bitcoin market and crunch the numbers into a simple meter from 0 to 100. Zero means "Extreme Fear", while 100 means "Extreme Greed". See below for further information on our data sources.

Data Sources

We are gathering data from the five following sources. Each data point is valued the same as the day before in order to visualize a meaningful progress in sentiment change of the crypto market.

First of all, the current index is for bitcoin only (we offer separate indices for large alt coins soon), because a big part of it is the volatility of the coin price.

But let’s list all the different factors we’re including in the current index:

Volatility (25 %)

We’re measuring the current volatility and max. drawdowns of bitcoin and compare it with the corresponding average values of the last 30 days and 90 days. We argue that an unusual rise in volatility is a sign of a fearful market.

Market Momentum/Volume (25%)

Also, we’re measuring the current volume and market momentum (again in comparison with the last 30/90 day average values) and put those two values together. Generally, when we see high buying volumes in a positive market on a daily basis, we conclude that the market acts overly greedy / too bullish.

Social Media (15%)

While our reddit sentiment analysis is still not in the live index (we’re still experimenting some market-related key words in the text processing algorithm), our twitter analysis is running. There, we gather and count posts on various hashtags for each coin (publicly, we show only those for Bitcoin) and check how fast and how many interactions they receive in certain time frames). A unusual high interaction rate results in a grown public interest in the coin and in our eyes, corresponds to a greedy market behaviour.

Surveys (15%) currently paused

Together with strawpoll.com (disclaimer: we own this site, too), quite a large public polling platform, we’re conducting weekly crypto polls and ask people how they see the market. Usually, we’re seeing 2,000 - 3,000 votes on each poll, so we do get a picture of the sentiment of a group of crypto investors. We don’t give those results too much attention, but it was quite useful in the beginning of our studies. You can see some recent results here.

Dominance (10%)

The dominance of a coin resembles the market cap share of the whole crypto market. Especially for Bitcoin, we think that a rise in Bitcoin dominance is caused by a fear of (and thus a reduction of) too speculative alt-coin investments, since Bitcoin is becoming more and more the safe haven of crypto. On the other side, when Bitcoin dominance shrinks, people are getting more greedy by investing in more risky alt-coins, dreaming of their chance in next big bull run. Anyhow, analyzing the dominance for a coin other than Bitcoin, you could argue the other way round, since more interest in an alt-coin may conclude a bullish/greedy behaviour for that specific coin.

Trends (10%)

We pull Google Trends data for various Bitcoin related search queries and crunch those numbers, especially the change of search volumes as well as recommended other currently popular searches. For example, if you check Google Trends for "Bitcoin", you can’t get much information from the search volume. But currently, you can see that there is currently a +1,550% rise of the query „bitcoin price manipulation“ in the box of related search queries (as of 05/29/2018). This is clearly a sign of fear in the market, and we use that for our index.

There's a story behind every dataset and here's your opportunity to share yours.

Copyright disclaimer

This dataset is produced and maintained by the administrators of https://alternative.me/crypto/fear-and-greed-index/.

This published version is an unofficial copy of their data, which can be also collected using their API (e.g., GET https://api.alternative.me/fng/?limit=10&format=csv&date_format=us).

--- Original source retains full ownership of the source dataset ---
US Airlines Twitter (Over time)
kaggle.com
Updated Nov 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). US Airlines Twitter (Over time) [Dataset]. https://www.kaggle.com/datasets/thedevastator/sentiment-analysis-of-us-airline-twitter-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 18, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
US Airlines Twitter (Over time)

Study the trend customer satisfaction over time

About this dataset

The columns in the dataset include index, unit id, golden, unit state, trusted judgments, last judgment at, airline sentiment, airline sentiment confidence, negative reason, negative reason confidence, airline_sentiment_gold and retweet count. There is also text included for each tweet as well as tweet location and user timezone.

Using this dataset, you can get a feel for how customers of various airlines feel about their service. You can use the data to analyze trends over time or compare different airlines. Some research ideas include using airline sentiment to predict the stock market or using the negativereason data to help airlines improve their customer service

How to use the dataset

Looking at this dataset, you can get a feel for how customers of various airlines feel about their service. The data includes the airline, the tweet text, the date of the tweet, and various other information. You can use this to analyze trends over time or compare different airlines

Research Ideas

Using airline sentiment to predict the stock market - is there a correlation between how the public perceives an airline and how that airline's stock performs?

Using negativereason data to help airlines improve their customer service - which negative reasons are mentioned most often? Are there certain airlines that are consistently mentioned for specific reasons?

Use the tweet data to map out airline hot spots - where do people tend to tweet about certain airlines the most? Is there a geographic pattern to sentiment about specific airlines?

Acknowledgements

If you use this dataset in your research, please credit Social Media Data

License

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) - You are free to: - Share - copy and redistribute the material in any medium or format for non-commercial purposes only. - Adapt - remix, transform, and build upon the material for non-commercial purposes only. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - You may not: - Use the material for commercial purposes.

Columns

File: Airline-Sentiment-2-w-AA.csv | Column name | Description | |:---------------------------|:-----------------------------------------------------------------------------| | _golden | This column is the gold standard column. (Boolean) | | _unit_state | This column is the state of the unit. (String) | | _trusted_judgments | This column is the number of trusted judgments. (Numeric) | | _last_judgment_at | This column is the timestamp of the last judgment. (String) | | airline_sentiment | This column is the sentiment of the tweet. (String) | | negativereason | This column is the negative reason for the sentiment. (String) | | airline_sentiment_gold | This column is the gold standard sentiment of the tweet. (String) | | name | This column is the name of the airline. (String) | | negativereason_gold | This column is the gold standard negative reason for the sentiment. (String) | | retweet_count | This column is the number of retweets. (Numeric) | | text | This column is the text of the tweet. (String) | | tweet_coord | This column is the coordinates of the tweet. (String) | | tweet_created | This column is the timestamp of the tweet. (String) | | tweet_location | This column is the location of the tweet. (String) | | user_timezone | This column is the timezone of the user. (String) |
o
Customer Review Analytics Dataset
opendatabay.com
.undefined
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Customer Review Analytics Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/86ee10d3-abf9-44cd-8532-fa4e72cdea06
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
Welcome to the Women’s Clothing E-Commerce Reviews dataset, a collection of customer reviews designed for deep textual analysis. This dataset offers a rich environment for parsing review text through its multiple dimensions, making it suitable for a variety of analytical tasks. Given that this is real commercial data, all identifying company references within the review text and body have been meticulously anonymised and replaced with "retailer".

Columns

Clothing ID: An integer categorical variable that uniquely refers to the specific clothing item being reviewed [1].

Age: A positive integer variable indicating the age of the reviewer [1, 2].

Title: A string variable capturing the title of the customer's review [1, 2].

Review Text: A string variable containing the main body of the customer's review [1, 2].

Rating: A positive ordinal integer variable representing the product score assigned by the customer, ranging from 1 (Worst) to 5 (Best) [1, 2].

Recommended IND: A binary variable indicating whether the customer recommends the product (1 for recommended, 0 for not recommended) [1, 2].

Positive Feedback Count: A positive integer documenting the number of other customers who found this review helpful or positive [1, 2].

Division Name: A categorical name detailing the high-level division of the product [1, 2].

Department Name: A categorical name specifying the department to which the product belongs [1, 2].

Class Name: A categorical name indicating the product's specific class [1].

Distribution

This dataset comprises 23,486 rows and 10 distinct feature variables [1]. Each row meticulously corresponds to a single customer review [1]. The data file is typically presented in CSV format [3]. A sample file is intended to be updated separately to the platform [3].

Usage

This dataset is ideally suited for tasks involving sentiment analysis, natural language processing (NLP), and classification [1]. Potential applications include developing robust product recommendation systems, identifying emerging trends in customer feedback, and conducting in-depth market research into consumer preferences and satisfaction within the e-commerce fashion sector.

Coverage

The dataset is globally relevant [4], encompassing a wide range of customer reviews from an e-commerce context. While specific time ranges are not detailed, it includes demographic information such as the reviewer's age [1, 2]. The commercial nature of the data means it has been anonymised to protect privacy [1].

License

CC0

Who Can Use It

This dataset is highly beneficial for data scientists, machine learning engineers, and natural language processing specialists aiming to build predictive models or perform textual analysis [1]. Furthermore, market researchers and business analysts can leverage this data to gain insights into customer behaviour, product performance, and overall market sentiment within the clothing e-commerce domain.

Dataset Name Suggestions

Women's E-Commerce Clothing Reviews [1]

Online Fashion Customer Feedback

Apparel Review Sentiment Data

E-commerce Clothing Product Ratings

Customer Review Analytics Dataset

Attributes

Original Data Source: 👗 Women's E-Commerce Clothing Reviews
o
Financial News Headlines Data
opendatabay.com
.undefined
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Financial News Headlines Data [Dataset]. https://www.opendatabay.com/data/dataset/75f5a0aa-5b18-405b-b673-0af308f23961
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
Context Scraped from CNBC, the Guardian, and Reuters official websites, the headlines in these datasets reflects the overview of the U.S. economy and stock market every day for the past year to 2 years.

Content Data scraped from CNBC contains the headlines, last updated date, and the preview text of articles from the end of December 2017 to July 19th, 2020. Data scraped from the Guardian Business contains the headlines and last updated date of articles from the end of December 2017 to July 19th, 2020 since the Guardian Business does not offer preview text. Data scraped from Reuters contains the headlines, last updated date, and the preview text of articles from the end of March 2018 to July 19th, 2020. Inspiration I firmly believe that the sentiment of financial news articles reflects and directs the performance of the U.S. stock market. Therefore, by applying Natural Language Processing (NLP) through these headlines, I can see how the positivity/negativity of the score through each day correlate to the stock market's gains/losses.

The cover image was taken from https://hipwallpaper.com/stock-trader-wallpapers/

Original Data Source: Financial News Headlines Data

Facebook

Twitter

Click to copy link

Link copied

Cite

Datasimple (2025). Daily Market News Dataset [Dataset]. https://www.opendatabay.com/data/financial/75f5a0aa-5b18-405b-b673-0af308f23961

Daily Market News Dataset

Explore at:

.undefinedAvailable download formats

Dataset updated

Jul 3, 2025

Dataset authored and provided by

Datasimple

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Data Science and Analytics

Description

This dataset features financial news headlines collected from leading financial news websites, including CNBC, The Guardian, and Reuters. It provides an overview of the U.S. economy and stock market, primarily reflecting daily market sentiment over several years. The main purpose of this dataset is to facilitate Natural Language Processing (NLP) analyses to explore the correlation between the positivity or negativity of news sentiment and U.S. stock market performance, such as gains and losses. It is ideal for data scientists and analysts keen on understanding market dynamics through textual data.

Columns

The dataset typically includes the following columns, though availability may vary slightly by source: * Headlines: The main title or headline of the financial article. * Time: The last updated date and time of the article. * Description: A preview or summary text of the article's content.

Distribution

The data files are generally provided in CSV format. Specific numbers for rows or records are not available within the provided sources, but the dataset is structured to allow for easy processing and analysis.

Usage

This dataset is well-suited for a variety of applications, including: * Sentiment analysis of financial news to predict market movements. * Developing and testing Natural Language Processing (NLP) models. * Data science and analytics projects focused on economic trends and stock market performance. * Research into the impact of media on financial markets.

Coverage

The dataset covers news related to the U.S. economy and stock market. * Time Range: * CNBC and The Guardian data spans from late December 2017 to 19th July 2020. * Reuters data covers from late March 2018 to 19th July 2020. * Collectively, the headlines reflect an overview of the U.S. economy and stock market for approximately one to two years from their scraping date.

License

CCO

Who Can Use It

This dataset is intended for a range of users, including: * Data Scientists and Analysts performing market sentiment analysis. * Researchers studying economic indicators and financial news impact. * Individuals interested in Natural Language Processing (NLP) and text analysis applications in finance. * Anyone looking to gain insights into the relationship between news sentiment and stock market performance.

Dataset Name Suggestions

US Financial News Headlines
Stock Market Sentiment News
Financial Article Headlines
Daily Market News Dataset
Economy News Headlines for NLP

Attributes

Original Data Source: Financial News Headlines Data

Clear search

Close search

Google apps

Main menu

Daily Market News Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Retail Transactions Dataset

Context:

Inspiration:

Dataset Information:

Use Cases:

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

Data from: Financial News Sentiment Analysis Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

AI Training Dataset Report

Tweet Sentiment's Impact on Stock Returns

Tweet Sentiment's Impact on Stock Returns

862,231 Labeled Instances

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Labelled Market Sentiment Analysis Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Federal Reserve Policy Transcripts Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

ai training dataset Report

Massive Stock Sentiment Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Forex News Annotated Dataset for Sentiment Analysis

Data Science Book Customer Ratings Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Flipkart Products Review Dataset(363K Data)

Financial News Sentiment Dataset

Columns

Distribution

Usage

Coverage

`Context:`

`Inspiration:`

`Dataset Information:`

`Use Cases:`