https://brightdata.com/licensehttps://brightdata.com/license
Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.
Dataset Features
News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.
Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.
Popular Use Cases
Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.
Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
HackerNoon curated the internet's most cited 7M+ tech company news articles and blog posts about the 3k+ most valuable tech companies in 2022 and 2023. These stories were curated to power HackerNoon.com/Companies, where we update daily news on top technology companies like Microsoft, Google, and HuggingFace. Please use this news data freely for your project, and as always anyone is welcome to publish on HackerNoon.
https://brightdata.com/licensehttps://brightdata.com/license
Stay informed with our comprehensive Financial News Dataset, designed for investors, analysts, and businesses to track market trends, monitor financial events, and make data-driven decisions.
Dataset Features
Financial News Articles: Access structured financial news data, including headlines, summaries, full articles, publication dates, and source details. Market & Economic Indicators: Track financial reports, stock market updates, economic forecasts, and corporate earnings announcements. Sentiment & Trend Analysis: Analyze news sentiment, categorize articles by financial topics, and monitor emerging trends in global markets. Historical & Real-Time Data: Retrieve historical financial news archives or access continuously updated feeds for real-time insights.
Customizable Subsets for Specific Needs Our Financial News Dataset is fully customizable, allowing you to filter data based on publication date, region, financial topics, sentiment, or specific news sources. Whether you need broad coverage for market research or focused data for investment analysis, we tailor the dataset to your needs.
Popular Use Cases
Investment Strategy & Risk Management: Monitor financial news to assess market risks, identify investment opportunities, and optimize trading strategies. Market & Competitive Intelligence: Track industry trends, competitor financial performance, and economic developments. AI & Machine Learning Training: Use structured financial news data to train AI models for sentiment analysis, stock prediction, and automated trading. Regulatory & Compliance Monitoring: Stay updated on financial regulations, policy changes, and corporate governance news. Economic Research & Forecasting: Analyze financial news trends to predict economic shifts and market movements.
Whether you're tracking stock market trends, analyzing financial sentiment, or training AI models, our Financial News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Get access to a structured dataset of articles from TechCrunch, a top source for startup, technology, and business news. This dataset includes thousands of articles covering topics like venture funding, product launches, AI, crypto, and more.
Perfect for use in:
News aggregation and monitoring
Sentiment or trend analysis
NLP model training
Startup or tech sector research
The data is available in CSV and JSON formats and can be customized by date or topic on request.
đ Contact us for full access or a filtered sample.
Estimates of total businesses broken down by industry (2, 3, 4 digit SIC 2007 codes and industry section). Workplace data units from Annual Business Inquiry (ABI) for London and Great Britain. Data rounded to the nearest 100. Percentages calculated on unrounded data. An extract compiled from the Inter Departmental Business Register (IDBR) recording the number of local units that were live at a reference date in March. Estimates can be broken down by employment size band, detailed industry (5 digit SIC2007) and legal status. Available from country down to mid layer super output area and Scottish intermediate zones. A local unit is an individual site (for example a factory or shop) associated with an enterprise. It can also be referred to as a workplace. Industry is broken down using SIC 2007 codes. Read more about SIC here http://www.statistics.gov.uk/methods_quality/sic/downloads/SIC2007explanatorynotes.pdf The ABI is a business survey which collects both employment and financial information. Only employment information for the location of an employees workplace is available from Nomis The ABI is based on a sample of approximately 78,000 businesses and is used to provide an estimate of the number of employees. The difference between the estimate and its true value is known as the sampling error. The actual sampling error for any estimate is unknown but we can estimate, from the sample, a typical error, known as the standard error. This provides a means of assessing the precision of the estimate; the lower the standard error, the more confident we can be the estimate is close to the true value. https://www.nomisweb.co.uk/articles/showArticle.asp?title=Information&article=news/071212_abi-stderrors.htm This dataset excludes farm based agriculture data contained in SIC class 0100. Relevant link: https://www.nomisweb.co.uk/Default.asp
We can offer the news data in two formats: 1) News flow: all news flow for our company coverage including articles and tweets. 2) ESG Incidents: highlights any pressing issues that companies are facing in the news.
Our system executes around 100,000 searches per day across the internet. We search specific websites deemed to be high-quality and informationally additive for news about our whole company coverage.
These include: ⢠Mainstream publications like Reuters, CNN, CNBC, NBC News etc. ⢠NGO websites such as Ethical Consumer and Anti-Slavery International ⢠Investigative journalist websites like MLex ⢠National papers like the Japan Times ⢠Trade publications like Insurance Journal ⢠Sustainability publications like Edie.net
Each article that we download goes through rigorous processing. This includes cleaning the body of the article and adding its metadata e.g., the date that it was published.
We then calculate our proprietary ârelevanceâ scores. This is a calculation to determine how relevant the article is to the company, CEO, biggest Insider and biggest Outsider.
Natural Language Processing (NLP) techniques are used to calculate the similarity and sentiment scores for each article for each news topic.
We use Twitterâs API to download the latest tweets from Thought Leader Accounts. We track over 100 Thought Leaders such as Ceres and Science Based Targets.
These tweets are then searched to see if any of our company coverage is mentioned.
Afterwards, the same processing and calculation steps are followed as for the news articles.
ESG Incidents is the second news feed that we display for users. It is designed to show any pressing issues that a company is facing in the news in real-time.
To get ESG Incidents outputs we follow these steps: 1. Choose a time period of news to look at e.g., 3 months. 2. For each news topic (we have around 50) pick out the article(s) that have the highest relevance to a company and the highest similarity score over that time period. We multiply these two scores together to calculate an âIncidence Scoreâ. 3. Calculate how many times that new topic has come up in the news over the chosen time period as a proportion of the total articles for that company.
We are then able to see emerging trends and incidents for a particular company over a time period and also have the ability to see the most relevant articles for each news topic. This allows investors to see any emerging incidents or scandals for a company in real-time.
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Estimates of total businesses broken down by industry (2, 3, 4 digit SIC 2007 codes and industry section). Workplace data units from Annual Business Inquiry (ABI) for London and Great Britain.
Data rounded to the nearest 100. Percentages calculated on unrounded data
Industry is broken down using SIC 2007 codes. Read more about SIC here http://www.statistics.gov.uk/methods_quality/sic/downloads/SIC2007explanatorynotes.pdf
The ABI is a business survey which collects both employment and financial information. Only employment information for the location of an employees workplace is available from Nomis
The ABI is based on a sample of approximately 78,000 businesses and is used to provide an estimate of the number of employees.
The difference between the estimate and its true value is known as the sampling error. The actual sampling error for any estimate is unknown but we can estimate, from the sample, a typical error, known as the standard error. This provides a means of assessing the precision of the estimate; the lower the standard error, the more confident we can be the estimate is close to the true value. https://www.nomisweb.co.uk/articles/showArticle.asp?title=Information&article=news/071212_abi-stderrors.htm
This dataset excludes farm based agriculture data contained in SIC class 0100.
Relevant link: https://www.nomisweb.co.uk/Default.asp
PredictLeads News Events Data provides real-time market intelligence by capturing business-critical news events, categorizing them for sentiment analysis, company profiling, and competitive tracking. Our dataset leverages advanced web scraping and AI-driven classification, ensuring access to highly relevant insights that help businesses monitor competitors, assess risks, and refine growth strategies.
Use Cases: â Sentiment Analysis â Gauge public perception and market sentiment to refine brand positioning. â Account Profiling â Enrich CRM systems with real-time company event tracking. â Competitive Intelligence â Monitor industry news, mergers, and expansions to anticipate market shifts. â Market Research â Analyze business website updates and categorized news data for trend forecasting. â Risk Assessment â Detect negative sentiment or financial distress indicators in key market players.
Key API Attributes: - id (string, UUID) â Unique identifier for the news event. - category (string) â Categorization of the event (e.g., funding, acquisition, leadership change). - summary (string) â A brief overview of the detected event. - sentiment_score (float, nullable) â Positive, neutral, or negative sentiment rating for the event. - found_at (ISO 8601 date-time) â Timestamp when the news event was detected. - article_sentence (string, nullable) â Extracted key sentence from the news article. - location (string, nullable) â Geographic relevance of the event (e.g., company HQ, expansion region). - company (object) â The company associated with the event, including: - domain (string) â Companyâs website domain. - company_name (string) â Official company name. - ticker (string, nullable) â Stock ticker (if publicly traded). - source_url (string, URL) â Link to the original news article or company update.
đ PredictLeads News Events Data is trusted by market leaders for real-time competitive intelligence, ensuring faster, data-driven decision-making in sales, finance, and strategic planning.
PredictLeads News Events Dataset Docs: https://docs.predictleads.com/v3/guide/news_events_dataset
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
We have successfully extracted a comprehensive news dataset from CNBC, covering not only financial updates but also an extensive range of news categories relevant to diverse audiences in Europe, the US, and the UK. This dataset includes over 500,000 records, meticulously structured in JSON format for seamless integration and analysis.
This extensive extraction spans multiple segments, such as:
Each record in the dataset is enriched with metadata tags, enabling precise filtering by region, sector, topic, and publication date.
The comprehensive news dataset provides real-time insights into global developments, corporate strategies, leadership changes, and sector-specific trends. Designed for media analysts, research firms, and businesses, it empowers users to perform:
Additionally, the JSON format ensures easy integration with analytics platforms for advanced processing.
Looking for a rich repository of structured news data? Visit our news dataset collection to explore additional offerings tailored to your analysis needs.
To get a preview, check out the CSV sample of the CNBC economy articles dataset.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction ⢠News topic dataset consists of news articles classified into four major categories: World, Sports, Business, and Science/Technology. It is a subset of AG's corpus of news articles, providing a structured dataset for NLP-based text classification tasks.
2) Data Utilization (1) News topic data has characteristics that: ⢠The dataset includes descriptions of articles. (2) News topic data can be used to: ⢠Media Monitoring: Helps media companies and news aggregators categorize articles automatically, improving content management and recommendations. ⢠Academic Research: Provides data for studies on automatic text classification, topic discovery, and machine learning model performance.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.
This large dataset is ideal for:
Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.
The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
According to 56 percent of industry leaders surveyed in December 2023, back-end automation would be the most important use of artificial intelligence in newsrooms in 2024. Additionally, utilizing AI for distribution and recommendations such as personalized home pages and alerts was considered necessary for future business operations. News gathering was ranked as the least important use of AI, with only 22 percent of publishers feeling this would be important for their company in 2024. The ethics of AI in the newsroom Data from news and media organizations around the world revealed concerns about the ethical implications of AI in the newsroom. More than 80 percent of respondents said they were concerned about the ethics of AI when it came to editorial quality and the industry in general â but readersâ perceptions were less of a worry. Readers themselves, on the other hand, have priorities of their own â a UK study found that the majority of adults believed that media organizations should be required to display the ways AI was used to create a news article. The issue of trust News organizations should be mindful of how their readers feel about the use of AI in news â UK consumers are especially skeptical about the idea of an AI journalist and AI editor working on online news without human assistance. At a time when trust in human journalists is already relatively low, introducing AI into the mix could further damage public trust in the news and those reporting on it.
This data was used in our article Investigating positive/negative bias in Canadian newspapers through translation. To conduct our research, we used a subset of the Canadian Press Corpus in Finance (CAPCOF), composed of news items covering the 2007-2008 financial crisis and the years that led up to it (2001-2006). CAPCOF is a bilingual comparable corpus, containing texts in English and in French. The 2008 CAPCOF data subset contains 1,357,088 words in French and 1,403,907 words in English. The present Excel file was obtained using WordSmith 8.0 concordancer tool (Scott, 2020) and the 2008 CAPCOF data subset, and extracting occurrences of âconfidenceâ and âconfianceâ.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Platform companies play an important role in the production and distribution of news. This article analyses this role and questions of control, dependence and autonomy in the light of the âAI goldrushâ in the news. I argue that the introduction of AI in the news risks shifting even more control to and increasing the news industryâs dependence on platform companies. While platform companiesâ power over news organisations has to date mainly flown from their control over the channels of distribution, AI potentially allows them to extend this control to the means of production as the technology increasingly permeates all stages of the news-making process. As a result, news organisations risk becoming even more tethered to platform companies in the long-run, potentially limiting their autonomy and, by extension, contributing to a restructuring of the public arena as news organisations are re-shaped according to the logics of platform businesses. I conclude by mapping a research agenda that highlights potential implications and spells out areas in need of further exploration.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
The news: The present dataset consists of 1789 news articles from the British daily newspaper The Guardian extracted using the content endpoint of The Guardian Open Platform. The news articles were, at the time, all the news corresponding to the sections: business, politics, society and world news for the entire month of January of 2013 (for a total of 1689 news) and an extra set of news articles randomly selected from the period Febrary of 2013 to December of 2015 (100 news articles). The first set of 1689 news articles was used for training and the second set of 100 news articles was used for testing in two publications:
* Maisonnave, M., Delbianco, F., TohmĂŠ, F.A. and Maguitman, A.G., 2018, November. A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media. In XIX Simposio Argentino de Inteligencia Artificial (ASAI)-JAIIO 47 (CABA, 2018).
* Maisonnave, M., Delbianco, F., TohmĂŠ, F.A. and Maguitman, A.G., 2019. A Flexible Supervised Term-Weighting Technique and its Application to Variable Extraction and Information Retrieval. Inteligencia Artificial, 22(63), pp.61-80.
The labels: The entire dataset was manually classified into two possible categories: economically relevant and irrelevant. The labelling process was carried out by two experts in Economy working in collaboration. For each news article, the full text of the article was analyzed to determine the category.
The format: There are two different versions for this dataset: the reduced and the full versions. The former consists of a CSV and a readme file. The CSV file has five columns: "Instance No.", "Title", "Web Publication Date", "web URL" and "Economically Relevant". This version is reduced in columns as it does not include the full article texts; however, it does include all the 1789 instances.
Requesting the full dataset: To gain access to the full version of the dataset (which includes the body of the news articles), please send an email to mariano.maisonnave@cs.uns.edu.ar with a copy to openplatform@theguardian.com requesting authorization and making it clear that the data set will not be used for commercial purposes.
The dataset has some data which is officially available for research in COVID-19 and some I have added randomly by refering to various common facts and news articles because the required data for every feature for my project is not available yet.
This describes two fake news datasets covering seven different news domains. One of the datasets is collected by combining manual and crowdsourced annotation approaches (FakeNewsAMT), while the second is collected directly from the web (Celebrity).
The FakeNewsDatabase dataset contains news in six different domains: technology, education, business, sports, politics, and entertainment. The legitimate news included in the dataset were collected from a variety of mainstream news websites predominantly in the US such as the ABCNews, CNN, USAToday, NewYorkTimes, FoxNews, Bloomberg, and CNET among others. The fake news included in this dataset consist of fake versions of the legitimate news in the dataset, written using Mechanical Turk. More details on the data collection are provided in section 3 of the paper.
The Celebrity dataset contain news about celebrities (actors, singers, socialites, and politicians). The legitimate news in the dataset were obtained from entertainment, fashion and style news sections in mainstream news websites and from entertainment magazines websites. The fake news were obtained from gossip websites such as Entertainment Weekly, People Magazine, RadarOnline, and other tabloid and entertainment-oriented publications. The news articles were collected in pairs, with one article being legitimate and the other fake (rumors and false reports). The articles were manually verified using gossip-checking sites such as "GossipCop.com", and also cross-referenced with information from other entertainment news sources on the web.
The data directory contains two fake news datasets:
Celebrity The fake and legitimate news are provided in two separate folders. The fake and legitimate labels are also provided as part of the filename.
FakeNewsAMT The fake and legitimate news are provided in two separate folders. Each folder contains 40 news from six different domains: technology, education, business, sports, politics, and entertainment. The file names indicate the news domain: business (biz), education (edu), entertainment (entmt), politics (polit), sports (sports) and technology (tech). The fake and legitimate labels are also provided as part of the filename.
@article{Perez-Rosas18Automatic, author = {Ver\â{o}nica P\'{e}rez-Rosas, Bennett Kleinberg, Alexandra Lefevre, Rada Mihalcea}, title = {Automatic Detection of Fake News}, journal = {International Conference on Computational Linguistics (COLING)}, year = {2018} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data file contains news intensity measures for the UK and US, based on semantic fingerprints of the news articles from New York Times and the respective country. News articles in the following categories are used: Business Day, New York, U.S., World, Technology, Travel, Health, Real Estate, Science, Education, Automobiles, Your Money, Washington, Climate.
Official statistics are produced impartially and free from political influence.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AI has become (again) a matter of public interest, and it is crucial to investigate how the news media intervenes in the hype and publicity around AI in different countries. At the intersection between Media Studies and Science and Technology Studies (STS), this article examines portrayals of AI and related technologies in the Chilean news media. We curated a corpus of nearly 7000 AI-related news articles from 2008 to 2023 from four Chilean newspapers. We combined an LDA topic modelling with an analysis with dictionaries of the key actors and critical issues discussed around AI. The analysis shows the explosive growth of the media coverage of AI in recent years, as well as the diversity of topics associated with AI in Chile. We found a high prominence of topics related to industry and technology, a high visibility of international actors, mostly U.S. tech companies, and a low level of mentions of critical issues around AI. Moreover, we also discuss the low coverage of the Stateâs AI use, the emergent use of generative AI in tech journalism, and the prominence of topics such as the arts and humanities that appear as emerging spaces for the problematisation of AI in Chile.
https://brightdata.com/licensehttps://brightdata.com/license
Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.
Dataset Features
News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.
Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.
Popular Use Cases
Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.
Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.