100+ datasets found

Opinions on whether news written by AI is good or bad in the U.S. 2023, by...
statista.com
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Opinions on whether news written by AI is good or bad in the U.S. 2023, by age group [Dataset]. https://www.statista.com/statistics/1368583/ai-use-in-news-attitudes/
Explore at:
Dataset updated
Nov 28, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 26, 2023 - Jan 30, 2023
Area covered
United States
Description
A survey revealed that most U.S. adults believed AI-written news articles would be a bad thing, with 78 percent of all respondents saying that they felt this way, according to the results of a January 2023 survey. Younger consumers were the least likely to think this - 19 percent said they thought this would be a good thing, compared to just seven percent of their older peers aged 55 years or older.
Attitudes to the future of news written by AI in the U.S. 2023, by age group...
statista.com
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Attitudes to the future of news written by AI in the U.S. 2023, by age group [Dataset]. https://www.statista.com/statistics/1368580/ai-use-in-news-stories/
Explore at:
Dataset updated
Nov 28, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 26, 2023 - Jan 30, 2023
Area covered
United States
Description
A survey held in the United States in early 2023 found that most surveyed adults believe there will be a time where entire news articles are written by artificial intelligence, with 72 percent stating that this was what they expected to happen. Respondents under the age of 55 were marginally surer that solely AI-written news articles will be part of the future of news.
Largest news articles dataset from CNBC
crawlfeeds.com
csv, zip
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Largest news articles dataset from CNBC [Dataset]. https://crawlfeeds.com/datasets/cnbc-news-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Jan 6, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Explore the "Largest News Articles Dataset from CNBC," a comprehensive collection of news articles published by CNBC, one of the leading global news sources for business, finance, and current affairs.

This dataset includes thousands of articles covering a wide range of topics, such as financial markets, economic trends, technology, politics, health, and more. Each article in the dataset provides detailed information, including headlines, publication dates, authors, article content, and categories, offering valuable insights for researchers, data analysts, and media professionals.

Key Features:

Extensive Coverage: Thousands of news articles from CNBC, covering a diverse array of topics including business, finance, technology, and global news.

Detailed Metadata: Each article includes essential details such as headline, publication date, author, content, and category, allowing for in-depth analysis and research.

Ideal for Analysis: Perfect for researchers, data scientists, and content creators looking to analyze trends in news reporting, study media coverage, or develop content strategies.

Up-to-Date Information: Provides a rich source of information on current events and market trends, helping professionals stay informed and make data-driven decisions.

Whether you're conducting research on financial markets, analyzing media trends, or developing new content, the "Largest News Articles Dataset from CNBC" is an invaluable resource that provides detailed insights and comprehensive coverage of the latest news.
a
Online News Popularity Data Set
academictorrents.com
bittorrent
Updated Feb 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelwin Fernandes and Pedro Vinagre and Paulo Cortez and Pedro Sernadela (2016). Online News Popularity Data Set [Dataset]. https://academictorrents.com/details/95d3b03397a0bafd74a662fe13ba3550c13b7ce1
Explore at:
bittorrent(7476401)Available download formats
Dataset updated
Feb 11, 2016
Dataset authored and provided by
Kelwin Fernandes and Pedro Vinagre and Paulo Cortez and Pedro Sernadela
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Data Set Information: * The articles were published by Mashable (www.mashable.com) and their content as the rights to reproduce it belongs to them. Hence, this dataset does not share the original content but some statistics associated with it. The original content be publicly accessed and retrieved using the provided urls. * Acquisition date: January 8, 2015 * The estimated relative performance values were estimated by the authors using a Random Forest classifier and a rolling windows as assessment method. See their article for more details on how the relative performance values were set. ##Attribute Information: Number of Attributes: 61 (58 predictive attributes, 2 non-predictive, 1 goal field) 0. url: URL of the article (non-predictive) 1. timedelta: Days between the article publication and the dataset acquisition (non-predictive) 2. n_tokens_title: Number of words in the title 3. n_tokens_content: Number of words in the content 4. n_unique_tokens: Rate of unique words in the conte
Academic article descriptive statistics.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez (2023). Academic article descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0196346.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0196346.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Academic article descriptive statistics.
Social media as a news outlet worldwide 2025
statista.com
Updated Jul 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Social media as a news outlet worldwide 2025 [Dataset]. https://www.statista.com/statistics/718019/social-media-news-source/
Explore at:
Dataset updated
Jul 2, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2025 - Feb 2025
Area covered
Worldwide
Description
During a 2025 survey, ** percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just ** percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis. Social media: trust and consumption Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than ** percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than ** percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media. What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis. Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers. Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.
News Datasets
brightdata.com
.json, .csv, .xlsx
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data, News Datasets [Dataset]. https://brightdata.com/products/datasets/news
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.

Dataset Features

News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.

Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.

Popular Use Cases

Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.

Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
h
news-sentiment-data
huggingface.co
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amitk17 (2024). news-sentiment-data [Dataset]. https://huggingface.co/datasets/sweatSmile/news-sentiment-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2024
Authors
amitk17
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
sweatSmile/news-sentiment-data dataset hosted on Hugging Face and contributed by the HF Datasets community
CNN news dataset
crawlfeeds.com
json, zip
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). CNN news dataset [Dataset]. https://crawlfeeds.com/datasets/cnn-news-dataset
Explore at:
json, zipAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This dataset contains over 27,000 news articles sourced from CNN.com, including full content, metadata, and media fields. Each article is enriched with publish dates, author information, descriptions, and full raw + cleaned content—perfect for media research, sentiment analysis, topic modeling, and natural language processing (NLP) projects.

Last crawled in July 2021, this collection offers a historical snapshot of CNN’s reporting and editorial content.

Use Cases:

News content analysis

Fake news detection & bias tracking

Topic classification and clustering

Training AI/NLP models

Historical news trend research

Media monitoring tools

Update Frequency:

Archived — no current updates, great for snapshot-based analysis

ISOT Fake News Dataset

kaggle.com

Updated Dec 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Rahul Goel (2024). ISOT Fake News Dataset [Dataset]. https://www.kaggle.com/datasets/rahulogoel/isot-fake-news-dataset/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 29, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Rahul Goel

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

It is trained on data of around 45,000 news articles with a mix of real and fake news articles. The dataset is provided by the University of Victoria.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21948533%2Fa9c02011dc538fde2c967d56bfdb4778%2Fsubjects.png?generation=1735462720561554&alt=media" alt="distribution of topics">

The dataset contains two types of articles fake and real News. This dataset was collected from realworld sources; the truthful articles were obtained by crawling articles from Reuters.com (News website). As for the fake news articles, they were collected from different sources. The fake news articles were collected from unreliable websites that were flagged by Politifact (a fact-checking organization in the USA) and Wikipedia. The dataset contains different types of articles on different topics, however, the majority of articles focus on political and World news topics.

The dataset consists of two CSV files. The first file named “True.csv” contains more than 12,600 articles from reuter.com. The second file named “Fake.csv” contains more than 12,600 articles from different fake news outlet resources. Each article contains the following information: article title, text, type and the date the article was published on. To match the fake news data collected for kaggle.com, we focused mostly on collecting articles from 2016 to 2017. The data collected were cleaned and processed, however, the punctuations and mistakes that existed in the fake news were kept in the text.

The following table gives a breakdown of the categories and number of articles per category.

News	Size (Number of articles)	Subjects
Real-News	21417	Type	Articles size
		World-News	10145
		Politics-News	11272
Fake-News	23481	Type	Articles size
		Government-News	1570
		Middle-east	778
		US News	783
		Left-news	4459
		Politics	6841
		News	9050

Note- To cite this dataset use the information given by original authors:

Ahmed H, Traore I, Saad S. “Detecting opinion spams and fake news using text classification”, Journal of Security and Privacy, Volume 1, Issue 1, Wiley, January/February 2018.
Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham (pp. 127- 138)

CT-FAN-21 corpus: A dataset for Fake News Detection
zenodo.org
Updated Oct 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4714517
Dataset updated
Oct 23, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
Description
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

Citation

Please cite our work as

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

Task 3a

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Task 3b

public_id- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

domain - domain of the given news article(applicable only for task B)

Output data format

Task 3a

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

Task 3b

public_id- Unique identifier of the news article

predicted_domain- predicted domain

Sample file

public_id, predicted_domain 1, health 2, crime

Additional data for Training

To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

Fakenews Classification Datasets

Fake News Detection Challenge KDD 2020

FakeNewsNet

IMPORTANT!

Fake news article used for task 3b is a subset of task 3a.

We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

Evaluation Metrics

This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

Submission Link: https://competitions.codalab.org/competitions/31238

Related Work

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Z
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
data.niaid.nih.gov
zenodo.org
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schaer, Philipp (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
Explore at:
Dataset updated
Mar 1, 2023
Dataset provided by
Schaer, Philipp
Haak, Fabian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
Dataset of newspaper
figshare.com
datasetcatalog.nlm.nih.gov
bin
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YANRAN Ma; Zhenhua Liu; Dandan MA; Pengxiang Zhai; Kun Guo; Dayong Zhang; Qiang Ji (2023). Dataset of newspaper [Dataset]. http://doi.org/10.6084/m9.figshare.24071310.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24071310.v3
Dataset updated
Sep 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
YANRAN Ma; Zhenhua Liu; Dandan MA; Pengxiang Zhai; Kun Guo; Dayong Zhang; Qiang Ji
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The six newspapers are chosen as the main sources for constructing the CCPU index: People’s Daily, Guangming Daily, Economic Daily, Global Times, Science and Technology Daily, and China News Service. The newspaper data are collected from the Wisenews database between January 2000 and December 2022. 1755826 newspaper data are stored in news_six_all.csv.
Trust in online news written and edited by AI vs humans UK 2024
statista.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Trust in online news written and edited by AI vs humans UK 2024 [Dataset]. https://www.statista.com/statistics/1462607/journalism-news-reporting-ai-uk-trust/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 13, 2024 - Jan 15, 2024
Area covered
United Kingdom
Description
A survey held on AI and journalism in January 2024 in the United Kingdom found that just ** percent of respondents would trust an online news article written by an AI journalist and edited by an AI editor. This is contrast to ** percent who said the same about content both created and edited by humans. Whilst the results suggest a lack of readiness for news content entirely generated and edited by AI, the data also highlights the general lack of trust in journalists and editors, with close to ** percent saying they would not trust human journalists or editors either.
c
Fake News Detection Dataset
cubig.ai
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Fake News Detection Dataset [Dataset]. https://cubig.ai/store/products/259/fake-news-detection-dataset
Explore at:
Dataset updated
May 27, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Fake News Detection dataset is used to analyze news articles in order to solve the problem of fake news. This dataset uses statistical characteristics of news articles to predict whether an article is real or fake. • Key features include word count, sentence length, unique word count, and average word length, and the label indicates whether the article is real (1) or fake (0).

2) Data Utilization (1) Characteristics of the Fake News Detection • This dataset provides various statistical features of news articles, helping to predict the veracity of the articles. • Each feature helps analyze the style and linguistic patterns of the articles, which is useful for comprehensively understanding the characteristics of fake news. • This dataset is useful for training fake news detection models and provides essential foundational data for distinguishing between real and fake news.

(2) Applications of the Fake News Detection • Distinguishing between real and fake news: By analyzing the features of each article, it is possible to predict whether an article is real or fake. • Developing fake news detection models: Machine learning algorithms can be used to train models for fake news detection. • Enhancing media and information reliability: By using this data, a system can be developed to assess the veracity of news, contributing to the improvement of media trustworthiness.
tech-company-news-data-dump
aifasthub.com
huggingface.co
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HackerNoon (2024). tech-company-news-data-dump [Dataset]. https://aifasthub.com/datasets/HackerNoon/tech-company-news-data-dump
Explore at:
Dataset updated
Jan 16, 2024
Dataset authored and provided by
HackerNoonhttps://hackernoon.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
HackerNoon curated the internet's most cited 7M+ tech company news articles and blog posts about the 3k+ most valuable tech companies in 2022 and 2023. These stories were curated to power HackerNoon.com/Companies, where we update daily news on top technology companies like Microsoft, Google, and HuggingFace. Please use this news data freely for your project, and as always anyone is welcome to publish on HackerNoon.
BBC News Dataset – February 2023 Edition
crawlfeeds.com
csv, zip
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). BBC News Dataset – February 2023 Edition [Dataset]. https://crawlfeeds.com/datasets/bbc-news-dataset-feb-2023
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 14, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Get access to a comprehensive and structured dataset of BBC News articles, freshly crawled and compiled in February 2023. This collection includes 1 million records from one of the world’s most trusted news organizations — perfect for training NLP models, sentiment analysis, and trend detection across global topics.

💾 Format: CSV (available in ZIP archive)

📢 Status: Published and available for immediate access

Use Cases

Train language models to summarize or categorize news

Detect media bias and compare narrative framing

Conduct research in journalism, politics, and public sentiment

Enrich news aggregation platforms with clean metadata

Analyze content distribution across categories (e.g. health, politics, tech)

This dataset ensures reliable and high-quality information sourced from a globally respected outlet. The format is optimized for quick ingestion into your pipelines — with clean text, timestamps, image links, and more.

Need a filtered dataset or want this refreshed for a later date? We offer on-demand news scraping as well.

👉 Request access or sample now
Z
News headlines of BBC articles published by @BBCBreaking twitter account
data.niaid.nih.gov
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mello, Caio (2022). News headlines of BBC articles published by @BBCBreaking twitter account [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6927799
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
Mello, Caio
Lewis, Nick
Istif Inci, Elçin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of a list of news articles headlines retrieved from tweets published by @BBCBreaking profile in specific years (2012, 2015, 2017, 2019 and 2022).

The dataset is in .csv format and is organised as follows:

Columns:

ID (tweet ID)

created_at (tweet publication's date)

url (url of the news article attached to the tweet)

Titles (news headline)

Rows: Each row contains a single news article headline sorted by date of publication (created_at). Total number of entries: 7213.

For more details about data collection refer to Github.
Time spent with mobile news articles in the U.S. by social media source 2015...
statista.com
Updated May 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2016). Time spent with mobile news articles in the U.S. by social media source 2015 [Dataset]. https://www.statista.com/statistics/674321/mobile-news-articles-time-engaged-social-media/
Explore at:
Dataset updated
May 5, 2016
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Sep 2015
Area covered
United States
Description
The statistic gives information on the average time engaged with news articles on a smartphone in the United States as of September 2015, sorted by article length and the social media source. According to the source, long-form articles found on Twitter were engaged with for an average of 133 seconds.
Z
CT-FAN: A Multilingual dataset for Fake News Detection
data.niaid.nih.gov
Updated Oct 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Wiegand (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4714516
Explore at:
Dataset updated
Oct 23, 2022
Dataset provided by
Julia Maria Struß
Gautam Kishore Shahi
Thomas Mandl
Juliane Köhler
Melanie Siegel
Michael Wiegand
Description
By downloading the data, you agree with the terms & conditions mentioned below:

Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

Citation

Please cite our work as

@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Cross-Lingual Task (German)

Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Output data format

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

IMPORTANT!

We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

Related Work

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104

Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2024). Opinions on whether news written by AI is good or bad in the U.S. 2023, by age group [Dataset]. https://www.statista.com/statistics/1368583/ai-use-in-news-attitudes/

Opinions on whether news written by AI is good or bad in the U.S. 2023, by age group

Explore at:

Dataset updated

Nov 28, 2024

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jan 26, 2023 - Jan 30, 2023

Area covered

United States

Description

A survey revealed that most U.S. adults believed AI-written news articles would be a bad thing, with 78 percent of all respondents saying that they felt this way, according to the results of a January 2023 survey. Younger consumers were the least likely to think this - 19 percent said they thought this would be a good thing, compared to just seven percent of their older peers aged 55 years or older.

Clear search

Close search

Google apps

Main menu

Opinions on whether news written by AI is good or bad in the U.S. 2023, by...

Attitudes to the future of news written by AI in the U.S. 2023, by age group...

Largest news articles dataset from CNBC

Online News Popularity Data Set

Academic article descriptive statistics.

Social media as a news outlet worldwide 2025

News Datasets

news-sentiment-data

CNN news dataset

Use Cases:

Update Frequency:

ISOT Fake News Dataset

Note- To cite this dataset use the information given by original authors:

CT-FAN-21 corpus: A dataset for Fake News Detection

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

Dataset of newspaper

Trust in online news written and edited by AI vs humans UK 2024

Fake News Detection Dataset

tech-company-news-data-dump

BBC News Dataset – February 2023 Edition

Use Cases

News headlines of BBC articles published by @BBCBreaking twitter account

Time spent with mobile news articles in the U.S. by social media source 2015...

CT-FAN: A Multilingual dataset for Fake News Detection

Opinions on whether news written by AI is good or bad in the U.S. 2023, by age group