68 datasets found

News Datasets
brightdata.com
.json, .csv, .xlsx
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data, News Datasets [Dataset]. https://brightdata.com/products/datasets/news
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.

Dataset Features

News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.

Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.

Popular Use Cases

Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.

Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
tech-company-news-data-dump
huggingface.co
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HackerNoon (2024). tech-company-news-data-dump [Dataset]. https://huggingface.co/datasets/HackerNoon/tech-company-news-data-dump
Explore at:
Dataset updated
Jan 16, 2024
Dataset authored and provided by
HackerNoonhttps://hackernoon.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
HackerNoon curated the internet's most cited 7M+ tech company news articles and blog posts about the 3k+ most valuable tech companies in 2022 and 2023. These stories were curated to power HackerNoon.com/Companies, where we update daily news on top technology companies like Microsoft, Google, and HuggingFace. Please use this news data freely for your project, and as always anyone is welcome to publish on HackerNoon.
b
Financial Datasets
brightdata.com
.json, .csv, .xlsx
Updated Dec 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2023). Financial Datasets [Dataset]. https://brightdata.com/products/datasets/news/financial
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 5, 2023
Dataset authored and provided by
Bright Data
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Stay informed with our comprehensive Financial News Dataset, designed for investors, analysts, and businesses to track market trends, monitor financial events, and make data-driven decisions.

Dataset Features

Financial News Articles: Access structured financial news data, including headlines, summaries, full articles, publication dates, and source details. Market & Economic Indicators: Track financial reports, stock market updates, economic forecasts, and corporate earnings announcements. Sentiment & Trend Analysis: Analyze news sentiment, categorize articles by financial topics, and monitor emerging trends in global markets. Historical & Real-Time Data: Retrieve historical financial news archives or access continuously updated feeds for real-time insights.

Customizable Subsets for Specific Needs Our Financial News Dataset is fully customizable, allowing you to filter data based on publication date, region, financial topics, sentiment, or specific news sources. Whether you need broad coverage for market research or focused data for investment analysis, we tailor the dataset to your needs.

Popular Use Cases

Investment Strategy & Risk Management: Monitor financial news to assess market risks, identify investment opportunities, and optimize trading strategies. Market & Competitive Intelligence: Track industry trends, competitor financial performance, and economic developments. AI & Machine Learning Training: Use structured financial news data to train AI models for sentiment analysis, stock prediction, and automated trading. Regulatory & Compliance Monitoring: Stay updated on financial regulations, policy changes, and corporate governance news. Economic Research & Forecasting: Analyze financial news trends to predict economic shifts and market movements.

Whether you're tracking stock market trends, analyzing financial sentiment, or training AI models, our Financial News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
c
Techcrunch news dataset
crawlfeeds.com
csv, zip
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Techcrunch news dataset [Dataset]. https://crawlfeeds.com/datasets/techcrunch-news-dataset
Explore at:
csv, zipAvailable download formats
Dataset updated
May 16, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Get access to a structured dataset of articles from TechCrunch, a top source for startup, technology, and business news. This dataset includes thousands of articles covering topics like venture funding, product launches, AI, crypto, and more.

Perfect for use in:

News aggregation and monitoring

Sentiment or trend analysis

NLP model training

Startup or tech sector research

The data is available in CSV and JSON formats and can be customized by date or topic on request.

👉 Contact us for full access or a filtered sample.
g
Office for National Statistics - Number of Businesses by Detailed Industry,...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics - Number of Businesses by Detailed Industry, 2, 3 and 4 Digit SIC [Dataset]. https://gimi9.com/dataset/london_number-businesses-detailed-industry-2-3-and-4-digit-sic/
Explore at:
Description
Estimates of total businesses broken down by industry (2, 3, 4 digit SIC 2007 codes and industry section). Workplace data units from Annual Business Inquiry (ABI) for London and Great Britain. Data rounded to the nearest 100. Percentages calculated on unrounded data. An extract compiled from the Inter Departmental Business Register (IDBR) recording the number of local units that were live at a reference date in March. Estimates can be broken down by employment size band, detailed industry (5 digit SIC2007) and legal status. Available from country down to mid layer super output area and Scottish intermediate zones. A local unit is an individual site (for example a factory or shop) associated with an enterprise. It can also be referred to as a workplace. Industry is broken down using SIC 2007 codes. Read more about SIC here http://www.statistics.gov.uk/methods_quality/sic/downloads/SIC2007explanatorynotes.pdf The ABI is a business survey which collects both employment and financial information. Only employment information for the location of an employees workplace is available from Nomis The ABI is based on a sample of approximately 78,000 businesses and is used to provide an estimate of the number of employees. The difference between the estimate and its true value is known as the sampling error. The actual sampling error for any estimate is unknown but we can estimate, from the sample, a typical error, known as the standard error. This provides a means of assessing the precision of the estimate; the lower the standard error, the more confident we can be the estimate is close to the true value. https://www.nomisweb.co.uk/articles/showArticle.asp?title=Information&article=news/071212_abi-stderrors.htm This dataset excludes farm based agriculture data contained in SIC class 0100. Relevant link: https://www.nomisweb.co.uk/Default.asp
d
GaiaLens News Data: real-time (refreshed daily), covers c.17,000 global...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GaiaLens, GaiaLens News Data: real-time (refreshed daily), covers c.17,000 global publicly traded companies, tracks 50 ESG themes [Dataset]. https://datarade.ai/data-products/gaialens-news-data-real-time-refreshed-daily-covers-c-17-gaialens
Explore at:
.json, .xml, .csv, .xls, .txtAvailable download formats
Dataset authored and provided by
GaiaLens
Area covered
Georgia, Nigeria, Indonesia, Slovenia, Togo, Bahamas, Pakistan, Norway, Croatia, New Zealand
Description
We can offer the news data in two formats: 1) News flow: all news flow for our company coverage including articles and tweets. 2) ESG Incidents: highlights any pressing issues that companies are facing in the news.

1) News flow

Our system executes around 100,000 searches per day across the internet. We search specific websites deemed to be high-quality and informationally additive for news about our whole company coverage.

These include: • Mainstream publications like Reuters, CNN, CNBC, NBC News etc. • NGO websites such as Ethical Consumer and Anti-Slavery International • Investigative journalist websites like MLex • National papers like the Japan Times • Trade publications like Insurance Journal • Sustainability publications like Edie.net

Each article that we download goes through rigorous processing. This includes cleaning the body of the article and adding its metadata e.g., the date that it was published.

We then calculate our proprietary “relevance” scores. This is a calculation to determine how relevant the article is to the company, CEO, biggest Insider and biggest Outsider.

Natural Language Processing (NLP) techniques are used to calculate the similarity and sentiment scores for each article for each news topic.

We use Twitter’s API to download the latest tweets from Thought Leader Accounts. We track over 100 Thought Leaders such as Ceres and Science Based Targets.

These tweets are then searched to see if any of our company coverage is mentioned.

Afterwards, the same processing and calculation steps are followed as for the news articles.

2) ESG Incidents

ESG Incidents is the second news feed that we display for users. It is designed to show any pressing issues that a company is facing in the news in real-time.

To get ESG Incidents outputs we follow these steps: 1. Choose a time period of news to look at e.g., 3 months. 2. For each news topic (we have around 50) pick out the article(s) that have the highest relevance to a company and the highest similarity score over that time period. We multiply these two scores together to calculate an “Incidence Score”. 3. Calculate how many times that new topic has come up in the news over the chosen time period as a proportion of the total articles for that company.

We are then able to see emerging trends and incidents for a particular company over a time period and also have the ability to see the most relevant articles for each news topic. This allows investors to see any emerging incidents or scandals for a company in real-time.
w
Number of Businesses by Detailed Industry, 2, 3 and 4 Digit SIC
data.wu.ac.at
gimi9.com
csv, xls
Updated Sep 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
London Datastore Archive (2015). Number of Businesses by Detailed Industry, 2, 3 and 4 Digit SIC [Dataset]. https://data.wu.ac.at/schema/datahub_io/YjhkN2NlMmYtNDkwMC00ZGJiLWIyMmItZDNkZDU1Yzk5NGYy
Explore at:
csv(91064.0), xls(287744.0)Available download formats
Dataset updated
Sep 26, 2015
Dataset provided by
London Datastore Archive
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
Estimates of total businesses broken down by industry (2, 3, 4 digit SIC 2007 codes and industry section). Workplace data units from Annual Business Inquiry (ABI) for London and Great Britain.

Data rounded to the nearest 100. Percentages calculated on unrounded data

Industry is broken down using SIC 2007 codes. Read more about SIC here http://www.statistics.gov.uk/methods_quality/sic/downloads/SIC2007explanatorynotes.pdf
The ABI is a business survey which collects both employment and financial information. Only employment information for the location of an employees workplace is available from Nomis
The ABI is based on a sample of approximately 78,000 businesses and is used to provide an estimate of the number of employees.
The difference between the estimate and its true value is known as the sampling error. The actual sampling error for any estimate is unknown but we can estimate, from the sample, a typical error, known as the standard error. This provides a means of assessing the precision of the estimate; the lower the standard error, the more confident we can be the estimate is close to the true value. https://www.nomisweb.co.uk/articles/showArticle.asp?title=Information&article=news/071212_abi-stderrors.htm

This dataset excludes farm based agriculture data contained in SIC class 0100.

Relevant link: https://www.nomisweb.co.uk/Default.asp
d
Web Scraping News Data | B2B Sentiment Data | Categorized News Events | 19M...
datarade.ai
.json
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PredictLeads (2024). Web Scraping News Data | B2B Sentiment Data | Categorized News Events | 19M Blogs, PR Sites and News Sites | 8.3M+ Records [Dataset]. https://datarade.ai/data-products/predictleads-web-scraping-data-news-data-categorited-new-predictleads
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 27, 2024
Dataset authored and provided by
PredictLeads
Area covered
Namibia, Svalbard and Jan Mayen, Canada, Northern Mariana Islands, Gabon, Sweden, South Africa, Vietnam, Italy, Niger
Description
PredictLeads News Events Data provides real-time market intelligence by capturing business-critical news events, categorizing them for sentiment analysis, company profiling, and competitive tracking. Our dataset leverages advanced web scraping and AI-driven classification, ensuring access to highly relevant insights that help businesses monitor competitors, assess risks, and refine growth strategies.

Use Cases: ✅ Sentiment Analysis – Gauge public perception and market sentiment to refine brand positioning. ✅ Account Profiling – Enrich CRM systems with real-time company event tracking. ✅ Competitive Intelligence – Monitor industry news, mergers, and expansions to anticipate market shifts. ✅ Market Research – Analyze business website updates and categorized news data for trend forecasting. ✅ Risk Assessment – Detect negative sentiment or financial distress indicators in key market players.

Key API Attributes: - id (string, UUID) – Unique identifier for the news event. - category (string) – Categorization of the event (e.g., funding, acquisition, leadership change). - summary (string) – A brief overview of the detected event. - sentiment_score (float, nullable) – Positive, neutral, or negative sentiment rating for the event. - found_at (ISO 8601 date-time) – Timestamp when the news event was detected. - article_sentence (string, nullable) – Extracted key sentence from the news article. - location (string, nullable) – Geographic relevance of the event (e.g., company HQ, expansion region). - company (object) – The company associated with the event, including: - domain (string) – Company’s website domain. - company_name (string) – Official company name. - ticker (string, nullable) – Stock ticker (if publicly traded). - source_url (string, URL) – Link to the original news article or company update.

📌 PredictLeads News Events Data is trusted by market leaders for real-time competitive intelligence, ensuring faster, data-driven decision-making in sales, finance, and strategic planning.

PredictLeads News Events Dataset Docs: https://docs.predictleads.com/v3/guide/news_events_dataset
c
Complete News Data Extracted from CNBC in JSON Format: Covering Business,...
crawlfeeds.com
json, zip
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Complete News Data Extracted from CNBC in JSON Format: Covering Business, Finance, Technology, and Global Trends for Europe, US, and UK Audiences [Dataset]. https://crawlfeeds.com/datasets/complete-news-data-extracted-from-cnbc-in-json-format-covering-business-finance-technology-and-global-trends-for-europe-us-and-uk-audiences
Explore at:
zip, jsonAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Area covered
United States, United Kingdom
Description
We have successfully extracted a comprehensive news dataset from CNBC, covering not only financial updates but also an extensive range of news categories relevant to diverse audiences in Europe, the US, and the UK. This dataset includes over 500,000 records, meticulously structured in JSON format for seamless integration and analysis.

Diverse News Segments for In-Depth Analysis

This extensive extraction spans multiple segments, such as:

Business and Market Analysis: Stay updated on major companies, mergers, and acquisitions.

Technology and Innovation: Explore developments in AI, cybersecurity, and digital transformation.

Economic Forecasts: Access insights into GDP, employment rates, inflation, and other economic indicators.

Geopolitical Developments: Understand the impact of political events and global trade dynamics on markets.

Personal Finance: Learn about saving strategies, investment tips, and real estate trends.

Each record in the dataset is enriched with metadata tags, enabling precise filtering by region, sector, topic, and publication date.

Why Choose This Dataset?

The comprehensive news dataset provides real-time insights into global developments, corporate strategies, leadership changes, and sector-specific trends. Designed for media analysts, research firms, and businesses, it empowers users to perform:

Trend Analysis

Sentiment Analysis

Predictive Modeling

Additionally, the JSON format ensures easy integration with analytics platforms for advanced processing.

Access More News Datasets

Looking for a rich repository of structured news data? Visit our news dataset collection to explore additional offerings tailored to your analysis needs.

Sample Dataset Available

To get a preview, check out the CSV sample of the CNBC economy articles dataset.
c
AG News Classification Dataset
cubig.ai
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2024). AG News Classification Dataset [Dataset]. https://cubig.ai/store/products/35/ag-news-classification-dataset
Explore at:
Dataset updated
Aug 1, 2024
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • News topic dataset consists of news articles classified into four major categories: World, Sports, Business, and Science/Technology. It is a subset of AG's corpus of news articles, providing a structured dataset for NLP-based text classification tasks.

2) Data Utilization (1) News topic data has characteristics that: • The dataset includes descriptions of articles. (2) News topic data can be used to: • Media Monitoring: Helps media companies and news aggregators categorize articles automatically, improving content management and recommendations. • Academic Research: Provides data for studies on automatic text classification, topic discovery, and machine learning model performance.
c
Fox News dataset is for analyzing media trends and narratives
crawlfeeds.com
csv, zip
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

Key Features of the Fox News Dataset

Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.

Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.

Format: Provided in CSV format for seamless integration into analytical and research tools.

Why Use This Dataset?

This large dataset is ideal for:

Text Classification: Develop machine learning models to classify and categorize news content.

Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.

Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.

Trend Analysis: Identify shifts in public discourse and media focus over time.

Explore More News Datasets

Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
AI use in newsrooms worldwide 2023
statista.com
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). AI use in newsrooms worldwide 2023 [Dataset]. https://www.statista.com/statistics/1119232/predictions-ai-initiatives-for-publishers/
Explore at:
Dataset updated
Nov 28, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 27, 2023 - Dec 20, 2023
Area covered
Worldwide
Description
According to 56 percent of industry leaders surveyed in December 2023, back-end automation would be the most important use of artificial intelligence in newsrooms in 2024. Additionally, utilizing AI for distribution and recommendations such as personalized home pages and alerts was considered necessary for future business operations. News gathering was ranked as the least important use of AI, with only 22 percent of publishers feeling this would be important for their company in 2024. The ethics of AI in the newsroom Data from news and media organizations around the world revealed concerns about the ethical implications of AI in the newsroom. More than 80 percent of respondents said they were concerned about the ethics of AI when it came to editorial quality and the industry in general – but readers’ perceptions were less of a worry. Readers themselves, on the other hand, have priorities of their own – a UK study found that the majority of adults believed that media organizations should be required to display the ways AI was used to create a news article. The issue of trust News organizations should be mindful of how their readers feel about the use of AI in news – UK consumers are especially skeptical about the idea of an AI journalist and AI editor working on online news without human assistance. At a time when trust in human journalists is already relatively low, introducing AI into the mix could further damage public trust in the news and those reporting on it.
d
Replication Data for: Investigating positive/negative bias in Canadian...
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gagnon, Chantal; Boulanger, Pier-Pascale (2023). Replication Data for: Investigating positive/negative bias in Canadian newspapers through translation: A study of “confidence” in a corpus of business news [Dataset]. http://doi.org/10.5683/SP3/JBLRTS
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/JBLRTS
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Gagnon, Chantal; Boulanger, Pier-Pascale
Area covered
Canada
Description
This data was used in our article Investigating positive/negative bias in Canadian newspapers through translation. To conduct our research, we used a subset of the Canadian Press Corpus in Finance (CAPCOF), composed of news items covering the 2007-2008 financial crisis and the years that led up to it (2001-2006). CAPCOF is a bilingual comparable corpus, containing texts in English and in French. The 2008 CAPCOF data subset contains 1,357,088 words in French and 1,403,907 words in English. The present Excel file was obtained using WordSmith 8.0 concordancer tool (Scott, 2020) and the 2008 CAPCOF data subset, and extracting occurrences of “confidence” and “confiance”.
f
Data from: Uneasy Bedfellows: AI in the News, Platform Companies and the...
tandf.figshare.com
docx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felix M. Simon (2023). Uneasy Bedfellows: AI in the News, Platform Companies and the Issue of Journalistic Autonomy [Dataset]. http://doi.org/10.6084/m9.figshare.19803504.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19803504.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
Felix M. Simon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Platform companies play an important role in the production and distribution of news. This article analyses this role and questions of control, dependence and autonomy in the light of the ‘AI goldrush’ in the news. I argue that the introduction of AI in the news risks shifting even more control to and increasing the news industry’s dependence on platform companies. While platform companies’ power over news organisations has to date mainly flown from their control over the channels of distribution, AI potentially allows them to extend this control to the means of production as the technology increasingly permeates all stages of the news-making process. As a result, news organisations risk becoming even more tethered to platform companies in the long-run, potentially limiting their autonomy and, by extension, contributing to a restructuring of the public arena as news organisations are re-shaped according to the logics of platform businesses. I conclude by mapping a research agenda that highlights potential implications and spells out areas in need of further exploration.
m
Economic Relevant News from The Guardian
data.mendeley.com
datosdeinvestigacion.conicet.gov.ar
+2more
Updated Dec 10, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariano Maisonnave (2019). Economic Relevant News from The Guardian [Dataset]. http://doi.org/10.17632/yt8j2f3hpp.2
Explore at:
Unique identifier
https://doi.org/10.17632/yt8j2f3hpp.2
Dataset updated
Dec 10, 2019
Authors
Mariano Maisonnave
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
The news: The present dataset consists of 1789 news articles from the British daily newspaper The Guardian extracted using the content endpoint of The Guardian Open Platform. The news articles were, at the time, all the news corresponding to the sections: business, politics, society and world news for the entire month of January of 2013 (for a total of 1689 news) and an extra set of news articles randomly selected from the period Febrary of 2013 to December of 2015 (100 news articles). The first set of 1689 news articles was used for training and the second set of 100 news articles was used for testing in two publications:

* Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2018, November. A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media. In XIX Simposio Argentino de Inteligencia Artificial (ASAI)-JAIIO 47 (CABA, 2018). * Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2019. A Flexible Supervised Term-Weighting Technique and its Application to Variable Extraction and Information Retrieval. Inteligencia Artificial, 22(63), pp.61-80.

The labels: The entire dataset was manually classified into two possible categories: economically relevant and irrelevant. The labelling process was carried out by two experts in Economy working in collaboration. For each news article, the full text of the article was analyzed to determine the category.

The format: There are two different versions for this dataset: the reduced and the full versions. The former consists of a CSV and a readme file. The CSV file has five columns: "Instance No.", "Title", "Web Publication Date", "web URL" and "Economically Relevant". This version is reduced in columns as it does not include the full article texts; however, it does include all the 1789 instances.

Requesting the full dataset: To gain access to the full version of the dataset (which includes the body of the news articles), please send an email to mariano.maisonnave@cs.uns.edu.ar with a copy to openplatform@theguardian.com requesting authorization and making it clear that the data set will not be used for commercial purposes.
COVID-19 INDIA
kaggle.com
Updated Mar 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Kyatham (2020). COVID-19 INDIA [Dataset]. https://www.kaggle.com/adityakyatham/covid19/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 29, 2020
Dataset provided by
Kaggle
Authors
Aditya Kyatham
Area covered
India
Description
The dataset has some data which is officially available for research in COVID-19 and some I have added randomly by refering to various common facts and news articles because the required data for every feature for my project is not available yet.
Fake-News-Dataset
kaggle.com
Updated Apr 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sumanthvrao (2019). Fake-News-Dataset [Dataset]. https://www.kaggle.com/sumanthvrao/fakenewsdataset/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 19, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sumanthvrao
Description
Introduction

This describes two fake news datasets covering seven different news domains. One of the datasets is collected by combining manual and crowdsourced annotation approaches (FakeNewsAMT), while the second is collected directly from the web (Celebrity).

Data collection

The FakeNewsDatabase dataset contains news in six different domains: technology, education, business, sports, politics, and entertainment. The legitimate news included in the dataset were collected from a variety of mainstream news websites predominantly in the US such as the ABCNews, CNN, USAToday, NewYorkTimes, FoxNews, Bloomberg, and CNET among others. The fake news included in this dataset consist of fake versions of the legitimate news in the dataset, written using Mechanical Turk. More details on the data collection are provided in section 3 of the paper.

The Celebrity dataset contain news about celebrities (actors, singers, socialites, and politicians). The legitimate news in the dataset were obtained from entertainment, fashion and style news sections in mainstream news websites and from entertainment magazines websites. The fake news were obtained from gossip websites such as Entertainment Weekly, People Magazine, RadarOnline, and other tabloid and entertainment-oriented publications. The news articles were collected in pairs, with one article being legitimate and the other fake (rumors and false reports). The articles were manually verified using gossip-checking sites such as "GossipCop.com", and also cross-referenced with information from other entertainment news sources on the web.

The data directory contains two fake news datasets:

Celebrity The fake and legitimate news are provided in two separate folders. The fake and legitimate labels are also provided as part of the filename.

FakeNewsAMT The fake and legitimate news are provided in two separate folders. Each folder contains 40 news from six different domains: technology, education, business, sports, politics, and entertainment. The file names indicate the news domain: business (biz), education (edu), entertainment (entmt), politics (polit), sports (sports) and technology (tech). The fake and legitimate labels are also provided as part of the filename.

Dataset citation :

@article{Perez-Rosas18Automatic, author = {Ver\’{o}nica P\'{e}rez-Rosas, Bennett Kleinberg, Alexandra Lefevre, Rada Mihalcea}, title = {Automatic Detection of Fake News}, journal = {International Conference on Computational Linguistics (COLING)}, year = {2018} }
f
News Intensity data in "Indirect News Coverage and Economic Policy...
brunel.figshare.com
xlsx
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fang Xu; Jiaying Wu (2024). News Intensity data in "Indirect News Coverage and Economic Policy Uncertainty" [Dataset]. http://doi.org/10.17633/rd.brunel.27854760.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.17633/rd.brunel.27854760.v1
Dataset updated
Dec 5, 2024
Dataset provided by
Brunel University London
Authors
Fang Xu; Jiaying Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data file contains news intensity measures for the UK and US, based on semantic fingerprints of the news articles from New York Times and the respective country. News articles in the following categories are used: Business Day, New York, U.S., World, Technology, Travel, Health, Real Estate, Science, Education, Automobiles, Your Money, Washington, Climate.
Business Demography, UK: 2021
gov.uk
Updated Nov 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2022). Business Demography, UK: 2021 [Dataset]. https://www.gov.uk/government/statistics/business-demography-uk-2021
Explore at:
Dataset updated
Nov 17, 2022
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Office for National Statistics
Area covered
United Kingdom
Description
Official statistics are produced impartially and free from political influence.
f
Data from: From Industry Hype to Emerging Criticism: Analysing Chilean News...
figshare.com
tandf.figshare.com
xlsx
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matías Valderrama Barragán; Martin Tironi; Dusan Cotoras; Teresa Correa; Mónica Humeres; Claudia López (2025). From Industry Hype to Emerging Criticism: Analysing Chilean News Media Coverage of Artificial Intelligence [Dataset]. http://doi.org/10.6084/m9.figshare.28247879.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28247879.v1
Dataset updated
Jul 10, 2025
Dataset provided by
Taylor & Francis
Authors
Matías Valderrama Barragán; Martin Tironi; Dusan Cotoras; Teresa Correa; Mónica Humeres; Claudia López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AI has become (again) a matter of public interest, and it is crucial to investigate how the news media intervenes in the hype and publicity around AI in different countries. At the intersection between Media Studies and Science and Technology Studies (STS), this article examines portrayals of AI and related technologies in the Chilean news media. We curated a corpus of nearly 7000 AI-related news articles from 2008 to 2023 from four Chilean newspapers. We combined an LDA topic modelling with an analysis with dictionaries of the key actors and critical issues discussed around AI. The analysis shows the explosive growth of the media coverage of AI in recent years, as well as the diversity of topics associated with AI in Chile. We found a high prominence of topics related to industry and technology, a high visibility of international actors, mostly U.S. tech companies, and a low level of mentions of critical issues around AI. Moreover, we also discuss the low coverage of the State’s AI use, the emergent use of generative AI in tech journalism, and the prominence of topics such as the arts and humanities that appear as emerging spaces for the problematisation of AI in Chile.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data, News Datasets [Dataset]. https://brightdata.com/products/datasets/news

News Datasets

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.

Dataset Features

News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.

Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.

Popular Use Cases

Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.

Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

Clear search

Close search

Google apps

Main menu

News Datasets

tech-company-news-data-dump

Financial Datasets

Techcrunch news dataset

Office for National Statistics - Number of Businesses by Detailed Industry,...

GaiaLens News Data: real-time (refreshed daily), covers c.17,000 global...

1) News flow

2) ESG Incidents

Number of Businesses by Detailed Industry, 2, 3 and 4 Digit SIC

Web Scraping News Data | B2B Sentiment Data | Categorized News Events | 19M...

Complete News Data Extracted from CNBC in JSON Format: Covering Business,...

Diverse News Segments for In-Depth Analysis

Why Choose This Dataset?

Access More News Datasets

Sample Dataset Available

AG News Classification Dataset

Fox News dataset is for analyzing media trends and narratives

Key Features of the Fox News Dataset

Why Use This Dataset?

Explore More News Datasets

AI use in newsrooms worldwide 2023

Replication Data for: Investigating positive/negative bias in Canadian...

Data from: Uneasy Bedfellows: AI in the News, Platform Companies and the...

Economic Relevant News from The Guardian

COVID-19 INDIA

Fake-News-Dataset

Introduction

Data collection

Dataset citation :

News Intensity data in "Indirect News Coverage and Economic Policy...

Business Demography, UK: 2021

Data from: From Industry Hype to Emerging Criticism: Analysing Chilean News...

News Datasets