Attribution-NonCommercial-ShareAlike 2.5 (CC BY-NC-SA 2.5)https://creativecommons.org/licenses/by-nc-sa/2.5/
License information was derived automatically
The news: The present dataset consists of 1789 news articles from the British daily newspaper The Guardian extracted using the content endpoint of The Guardian Open Platform. The news articles were, at the time, all the news corresponding to the sections: business, politics, society and world news for the entire month of January of 2013 (for a total of 1689 news) and an extra set of news articles randomly selected from the period Febrary of 2013 to December of 2015 (100 news articles). The first set of 1689 news articles was used for training and the second set of 100 news articles was used for testing in two publications: * Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2018, November. A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media. In XIX Simposio Argentino de Inteligencia Artificial (ASAI)-JAIIO 47 (CABA, 2018). * Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2019. A Flexible Supervised Term-Weighting Technique and its Application to Variable Extraction and Information Retrieval. Inteligencia Artificial, 22(63), pp.61-80. The labels: The entire dataset was manually classified into two possible categories: economically relevant and irrelevant. The labelling process was carried out by two experts in Economy working in collaboration. For each news article, the full text of the article was analyzed to determine the category. The format: There are two different versions for this dataset: the reduced and the full versions. The former consists of a CSV and a readme file. The CSV file has five columns: "Instance No.", "Title", "Web Publication Date", "web URL" and "Economically Relevant". This version is reduced in columns as it does not include the full article texts; however, it does include all the 1789 instances. Requesting the full dataset: To gain access to the full version of the dataset (which includes the body of the news articles), please send an email to mariano.maisonnave@cs.uns.edu.ar with a copy to openplatform@theguardian.com requesting authorization and making it clear that the data set will not be used for commercial purposes.
https://brightdata.com/licensehttps://brightdata.com/license
Stay informed with our comprehensive Financial News Dataset, designed for investors, analysts, and businesses to track market trends, monitor financial events, and make data-driven decisions.
Dataset Features
Financial News Articles: Access structured financial news data, including headlines, summaries, full articles, publication dates, and source details. Market & Economic Indicators: Track financial reports, stock market updates, economic forecasts, and corporate earnings announcements. Sentiment & Trend Analysis: Analyze news sentiment, categorize articles by financial topics, and monitor emerging trends in global markets. Historical & Real-Time Data: Retrieve historical financial news archives or access continuously updated feeds for real-time insights.
Customizable Subsets for Specific Needs Our Financial News Dataset is fully customizable, allowing you to filter data based on publication date, region, financial topics, sentiment, or specific news sources. Whether you need broad coverage for market research or focused data for investment analysis, we tailor the dataset to your needs.
Popular Use Cases
Investment Strategy & Risk Management: Monitor financial news to assess market risks, identify investment opportunities, and optimize trading strategies. Market & Competitive Intelligence: Track industry trends, competitor financial performance, and economic developments. AI & Machine Learning Training: Use structured financial news data to train AI models for sentiment analysis, stock prediction, and automated trading. Regulatory & Compliance Monitoring: Stay updated on financial regulations, policy changes, and corporate governance news. Economic Research & Forecasting: Analyze financial news trends to predict economic shifts and market movements.
Whether you're tracking stock market trends, analyzing financial sentiment, or training AI models, our Financial News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in the United States contracted 0.50 percent in the first quarter of 2025 over the previous quarter. This dataset provides the latest reported value for - United States GDP Growth Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Live Briefs Investor – US Covering thousands of listed securities and events across 80 news categories, Live Briefs Investor US is specifically designed to keep individual investors and active traders on top of breaking news that is likely to affect their portfolios.
Most of the largest and most respected retail and self-directed brokerage firms in the North America rely on MT Newswires to provide their clients with complete coverage of the financial markets. The Investor service includes timely and insightful commentary on equities, commodities, ETFs, economics, forex, options and fixed income assets throughout the day (6:30 am to 6:30 pm EST).
Every story is ticker-tagged and category-coded to allow for seamless platform integration. US Equities – significant events affecting individual public companies in the US: After-hours and pre-market news, trading activity and technical price level indications; Earnings estimate change alerts; Analyst Rating Changes- the most comprehensive view and coverage of rating changes available anywhere; ETF Power Play – daily trends in ETF trading activity; Mini and detailed sector summaries – pre-market, mid-day, and closing; Market Chatter – real-time coverage of trading desk rumors and breaking news; Zero noise: Only premium, original news and event analysis. Never any fillers (press releases, non-market related news, etc.).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Book, Magazine, Newspaper: Taobao Online Sales: Market Share data was reported at 0.150 % in Aug 2020. This records a decrease from the previous number of 0.180 % for Jul 2020. China Book, Magazine, Newspaper: Taobao Online Sales: Market Share data is updated monthly, averaging 0.150 % from Jun 2019 (Median) to Aug 2020, with 15 observations. The data reached an all-time high of 0.220 % in Feb 2020 and a record low of 0.080 % in Dec 2019. China Book, Magazine, Newspaper: Taobao Online Sales: Market Share data remains active status in CEIC and is reported by Moojing Market Intelligence. The data is categorized under China Premium Database’s Consumer Goods and Services – Table CN.HTB: Taobao and Tmall Online Sales: Cultural and Entertainment Article.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in the United States expanded 2 percent in the first quarter of 2025 over the same quarter of the previous year. This dataset provides the latest reported value for - United States GDP Annual Growth Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Enhancing Financial Market Predictions: Causality-Driven Feature Selection
Note:[Please help give a Like ❤️ if you think this FinSen dataset is good for you, Thanks:)] This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset’s extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective 160,000… See the full description on the dataset page: https://huggingface.co/datasets/EagleWHLiang/FinSen.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.
To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.
We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.
Examples of Annotated Headlines
Forex Pair
Headline
Sentiment
Explanation
GBPUSD
Diminishing bets for a move to 12400
Neutral
Lack of strong sentiment in either direction
GBPUSD
No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft
Positive
Positive sentiment towards GBPUSD (Cable) in the near term
GBPUSD
When are the UK jobs and how could they affect GBPUSD
Neutral
Poses a question and does not express a clear sentiment
JPYUSD
Appropriate to continue monetary easing to achieve 2% inflation target with wage growth
Positive
Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
USDJPY
Dollar rebounds despite US data. Yen gains amid lower yields
Neutral
Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
USDJPY
USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains
Negative
USDJPY is expected to reach a lower value, with the USD losing value against the JPY
AUDUSD
<p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
Positive
Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data file contains news intensity measures for the UK and US, based on semantic fingerprints of the news articles from New York Times and the respective country. News articles in the following categories are used: Business Day, New York, U.S., World, Technology, Travel, Health, Real Estate, Science, Education, Automobiles, Your Money, Washington, Climate.
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Each month we publish independent forecasts of key economic and fiscal indicators for the UK economy. Forecasts before 2010 are hosted by The National Archives.
We began publishing comparisons of independent forecasts in 1986. The first database brings together selected variables from those publications, averaged across forecasters. It includes series for Gross Domestic Product, the Consumer Prices Index, the Retail Prices Index, the Retail Prices Index excluding mortgage interest payments, Public Sector Net Borrowing and the Claimant Count. Our second database contains time series of independent forecasts for GDP growth, private consumption, government consumption, fixed investment, domestic demand and net trade, for 26 forecasters with at least 10 years’ worth of submissions since 2010.
We’d welcome feedback on how you find the database and any extra information that you’d like to see included. Email your comments to Carter.Adams@hmtreasury.gov.uk.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Understanding the mutual relationships between information flows and social activity in society today is one of the cornerstones of the social sciences. In financial economics, the key issue in this regard is understanding and quantifying how news of all possible types (geopolitical, environmental, social, financial, economic, etc.) affects trading and the pricing of firms in organized stock markets. In this article, we seek to address this issue by performing an analysis of more than 24 million news records provided by Thompson Reuters and of their relationship with trading activity for 206 major stocks in the S&P US stock index. We show that the whole landscape of news that affects stock price movements can be automatically summarized via simple regularized regressions between trading activity and news information pieces decomposed, with the help of simple topic modeling techniques, into their “thematic” features. Using these methods, we are able to estimate and quantify the impacts of news on trading. We introduce network-based visualization techniques to represent the whole landscape of news information associated with a basket of stocks. The examination of the words that are representative of the topic distributions confirms that our method is able to extract the significant pieces of information influencing the stock market. Our results show that one of the most puzzling stylized facts in financial economies, namely that at certain times trading volumes appear to be “abnormally large,” can be partially explained by the flow of news. In this sense, our results prove that there is no “excess trading,” when restricting to times when news is genuinely novel and provides relevant financial information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unemployment Rate in the United States decreased to 4.10 percent in June from 4.20 percent in May of 2025. This dataset provides the latest reported value for - United States Unemployment Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
We have successfully extracted a comprehensive news dataset from CNBC, covering not only financial updates but also an extensive range of news categories relevant to diverse audiences in Europe, the US, and the UK. This dataset includes over 500,000 records, meticulously structured in JSON format for seamless integration and analysis.
This extensive extraction spans multiple segments, such as:
Each record in the dataset is enriched with metadata tags, enabling precise filtering by region, sector, topic, and publication date.
The comprehensive news dataset provides real-time insights into global developments, corporate strategies, leadership changes, and sector-specific trends. Designed for media analysts, research firms, and businesses, it empowers users to perform:
Additionally, the JSON format ensures easy integration with analytics platforms for advanced processing.
Looking for a rich repository of structured news data? Visit our news dataset collection to explore additional offerings tailored to your analysis needs.
To get a preview, check out the CSV sample of the CNBC economy articles dataset.
https://www.icpsr.umich.edu/web/ICPSR/studies/22680/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/22680/terms
Many predictions of the economic and social costs of a modern-day pandemic are based on the effects of the influenza pandemic of 1918. Despite killing 675,000 people in the United States and 40 million worldwide, the influenza of 1918 has been nearly forgotten. The purpose of this paper is to provide an overview of the influenza pandemic of 1918 in the United States, its economic effects, and its implications for a modern-day pandemic. The paper provides a brief historical background as well as detailed influenza mortality statistics for cities and states, including those in the Eighth Federal Reserve District, that account for differences in race, income, and place of residence. Information is obtained from two sources: (i) newspaper articles published during the pandemic and (ii) a survey of economic research on the subject.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/5S1HRFhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/5S1HRF
Media outlets provide crucial inputs into the democratic process, yet they face increasingly severe economic challenges. I study how a newly salient manifestation of this pressure, reduced reporting capacity, influences political coverage. Focusing on newspapers in the United States, where industry-wide employment fell over 40% between 2007 and 2015, I use panel data to assess the relationship between reporting capacity and political coverage. Staff cuts substantially decrease the amount of political coverage newspapers provide. Across different samples and measurement approaches, a typical cutback to a newspaper's reporting staff reduces its annual political coverage by between 300 and 500 stories. These political news declines happen against the backdrop of similar reductions in nonpolitical coverage, meaning the share of newspaper articles focused on politics remains stable over this period. This demonstrates that economic pressure affects the political information environment by shaping the media's capacity to cover politics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in the United States was worth 29184.89 billion US dollars in 2024, according to official data from the World Bank. The GDP value of the United States represents 27.49 percent of the world economy. This dataset provides - United States GDP - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our dataset encompasses a comprehensive collection of Azerbaijani news texts from the Azertac (https://azertag.az/) State Agency, drawn from a variety of news articles.
Azertac, established on March 1, 1920, was recognized as a pioneering entity within the framework of international information agencies. It has played a pivotal role in the establishment and coordination of various associations, including the Association of National Information Agencies comprising nations affiliated with the Commonwealth of Independent States, the Association of News Agencies representing Turkish-speaking countries, and the Association of National News Agencies associated with countries participating in the Black Sea Economic Cooperation Organization. AZERTAC has engaged in collaborative endeavors with several renowned news agencies to foster global information exchange and cooperation. This extensive network of collaborations underscores Azertac's global reach and influence in international news dissemination.
The dataset comprises approximately three million rows, with each row representing a sentence extracted from diverse Azerbaijani news sources. These sentences cover a wide spectrum of subjects, including but not limited to politics, the economy, culture, sports, technology, and health. The Labeled dataset, which has been posted and publicly shared in the link, is organized to facilitate rigorous analysis and classification tasks, with essential metadata provided for each sentence.
The dataset is enriched with crucial metadata attributes that enhance its utility and applicability to various research tasks:
Dataset is presented in single csv file.
The dataset is enriched with crucial metadata attributes that enhance its utility and applicability to various research tasks:
This dataset compiles all of the newspaper articles (including news stories, op-eds, and feature stories) on migrant work and migrant workers published in Toronto Star, Globe & Mail, and National Post articles from 2010-2020 (N=1894). This dataset shows the codes ("frames") that articles used to when discussing migrant work and migrant workers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the appendix, the dataset, and the analysis files for the study "Unpacking the Nuances of Agenda-Setting in the Online Media Environment: An Hourly-Event Approach in the Context of Chinese Economic News."Except for the appendix, the "Data" folder contains 36 csv-format files, each for one specific news event. In each file, the first column "hour" denotes hourly intervals of the data, and the 2–6 columns denote the endogenous variables included in the VAR models (i.e., the raw volume of coverage or discussion in different groups concerning media, the neitizens, and other institutions of interest). The datasets have been aggregated by 19-hour lags each day, resulting in 266 lags for the 14-day time window."AnalysisFiles" folder contains the R code and copy results for analysis, in which:-TimeSeriesAnalysis" contains the R code for the time-series analysis of this study. Besides, this folder also contains copies of the results for VAR models.-"t-test & ANOVA" contains the results of 36 separate VAR models and the R code for the t-test and ANOVA for the event feature on the influence of agenda-setting. Besides, this folder also contains copies of the results of t-tests and ANOVA.-"Figure" contains the R code for creating Figure 1 and Figure 2 in the main text of this study and also contains copies of these two figures.
Attribution-NonCommercial-ShareAlike 2.5 (CC BY-NC-SA 2.5)https://creativecommons.org/licenses/by-nc-sa/2.5/
License information was derived automatically
The news: The present dataset consists of 1789 news articles from the British daily newspaper The Guardian extracted using the content endpoint of The Guardian Open Platform. The news articles were, at the time, all the news corresponding to the sections: business, politics, society and world news for the entire month of January of 2013 (for a total of 1689 news) and an extra set of news articles randomly selected from the period Febrary of 2013 to December of 2015 (100 news articles). The first set of 1689 news articles was used for training and the second set of 100 news articles was used for testing in two publications: * Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2018, November. A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media. In XIX Simposio Argentino de Inteligencia Artificial (ASAI)-JAIIO 47 (CABA, 2018). * Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2019. A Flexible Supervised Term-Weighting Technique and its Application to Variable Extraction and Information Retrieval. Inteligencia Artificial, 22(63), pp.61-80. The labels: The entire dataset was manually classified into two possible categories: economically relevant and irrelevant. The labelling process was carried out by two experts in Economy working in collaboration. For each news article, the full text of the article was analyzed to determine the category. The format: There are two different versions for this dataset: the reduced and the full versions. The former consists of a CSV and a readme file. The CSV file has five columns: "Instance No.", "Title", "Web Publication Date", "web URL" and "Economically Relevant". This version is reduced in columns as it does not include the full article texts; however, it does include all the 1789 instances. Requesting the full dataset: To gain access to the full version of the dataset (which includes the body of the news articles), please send an email to mariano.maisonnave@cs.uns.edu.ar with a copy to openplatform@theguardian.com requesting authorization and making it clear that the data set will not be used for commercial purposes.