100+ datasets found
  1. Opinions on whether news written by AI is good or bad in the U.S. 2023, by...

    • statista.com
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Opinions on whether news written by AI is good or bad in the U.S. 2023, by age group [Dataset]. https://www.statista.com/statistics/1368583/ai-use-in-news-attitudes/
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 26, 2023 - Jan 30, 2023
    Area covered
    United States
    Description

    A survey revealed that most U.S. adults believed AI-written news articles would be a bad thing, with 78 percent of all respondents saying that they felt this way, according to the results of a January 2023 survey. Younger consumers were the least likely to think this - 19 percent said they thought this would be a good thing, compared to just seven percent of their older peers aged 55 years or older.

  2. Attitudes to the future of news written by AI in the U.S. 2023, by age group...

    • statista.com
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Attitudes to the future of news written by AI in the U.S. 2023, by age group [Dataset]. https://www.statista.com/statistics/1368580/ai-use-in-news-stories/
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 26, 2023 - Jan 30, 2023
    Area covered
    United States
    Description

    A survey held in the United States in early 2023 found that most surveyed adults believe there will be a time where entire news articles are written by artificial intelligence, with 72 percent stating that this was what they expected to happen. Respondents under the age of 55 were marginally surer that solely AI-written news articles will be part of the future of news.

  3. a

    Online News Popularity Data Set

    • academictorrents.com
    bittorrent
    Updated Feb 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelwin Fernandes and Pedro Vinagre and Paulo Cortez and Pedro Sernadela (2016). Online News Popularity Data Set [Dataset]. https://academictorrents.com/details/95d3b03397a0bafd74a662fe13ba3550c13b7ce1
    Explore at:
    bittorrent(7476401)Available download formats
    Dataset updated
    Feb 11, 2016
    Dataset authored and provided by
    Kelwin Fernandes and Pedro Vinagre and Paulo Cortez and Pedro Sernadela
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Data Set Information: * The articles were published by Mashable (www.mashable.com) and their content as the rights to reproduce it belongs to them. Hence, this dataset does not share the original content but some statistics associated with it. The original content be publicly accessed and retrieved using the provided urls. * Acquisition date: January 8, 2015 * The estimated relative performance values were estimated by the authors using a Random Forest classifier and a rolling windows as assessment method. See their article for more details on how the relative performance values were set. ##Attribute Information: Number of Attributes: 61 (58 predictive attributes, 2 non-predictive, 1 goal field) 0. url: URL of the article (non-predictive) 1. timedelta: Days between the article publication and the dataset acquisition (non-predictive) 2. n_tokens_title: Number of words in the title 3. n_tokens_content: Number of words in the content 4. n_unique_tokens: Rate of unique words in the conte

  4. Academic article descriptive statistics.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez (2023). Academic article descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0196346.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Academic article descriptive statistics.

  5. News Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, News Datasets [Dataset]. https://brightdata.com/products/datasets/news
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Stay ahead with our comprehensive News Dataset, designed for businesses, analysts, and researchers to track global events, monitor media trends, and extract valuable insights from news sources worldwide.

    Dataset Features

    News Articles: Access structured news data, including headlines, summaries, full articles, publication dates, and source details. Ideal for media monitoring and sentiment analysis. Publisher & Source Information: Extract details about news publishers, including domain, region, and credibility indicators. Sentiment & Topic Classification: Analyze news sentiment, categorize articles by topic, and track emerging trends in real time. Historical & Real-Time Data: Retrieve historical archives or access continuously updated news feeds for up-to-date insights.

    Customizable Subsets for Specific Needs Our News Dataset is fully customizable, allowing you to filter data based on publication date, region, topic, sentiment, or specific news sources. Whether you need broad coverage for trend analysis or focused data for competitive intelligence, we tailor the dataset to your needs.

    Popular Use Cases

    Media Monitoring & Reputation Management: Track brand mentions, analyze media coverage, and assess public sentiment. Market & Competitive Intelligence: Monitor industry trends, competitor activity, and emerging market opportunities. AI & Machine Learning Training: Use structured news data to train AI models for sentiment analysis, topic classification, and predictive analytics. Financial & Investment Research: Analyze news impact on stock markets, commodities, and economic indicators. Policy & Risk Analysis: Track regulatory changes, geopolitical events, and crisis developments in real time.

    Whether you're analyzing market trends, monitoring brand reputation, or training AI models, our News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  6. h

    news-data

    • huggingface.co
    Updated May 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Okite Chimaobi Samuel (2023). news-data [Dataset]. https://huggingface.co/datasets/okite97/news-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2023
    Authors
    Okite Chimaobi Samuel
    License

    https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/

    Description

    Dataset Card for news-data

      Dataset Summary
    

    The News Dataset is an English-language dataset containing just over 4k unique news articles scrapped from AriseTv- One of the most popular news television in Nigeria.

      Supported Tasks and Leaderboards
    

    It supports news article classification into different categories.

      Languages
    

    English

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    ''' {'Title': 'Nigeria: APC Yet to Zone Party Positions Ahead of… See the full description on the dataset page: https://huggingface.co/datasets/okite97/news-data.

  7. P

    RealNews Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jan 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rowan Zellers; Ari Holtzman; Hannah Rashkin; Yonatan Bisk; Ali Farhadi; Franziska Roesner; Yejin Choi (2023). RealNews Dataset [Dataset]. https://paperswithcode.com/dataset/realnews
    Explore at:
    Dataset updated
    Jan 30, 2023
    Authors
    Rowan Zellers; Ari Holtzman; Hannah Rashkin; Yonatan Bisk; Ali Farhadi; Franziska Roesner; Yejin Choi
    Description

    RealNews is a large corpus of news articles from Common Crawl. Data is scraped from Common Crawl, limited to the 5000 news domains indexed by Google News. The authors used the Newspaper Python library to extract the body and metadata from each article. News from Common Crawl dumps from December 2016 through March 2019 were used as training data; articles published in April 2019 from the April 2019 dump were used for evaluation. After deduplication, RealNews is 120 gigabytes without compression.

  8. Time spent with mobile news articles in the U.S. by social media source 2015...

    • statista.com
    Updated May 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Time spent with mobile news articles in the U.S. by social media source 2015 [Dataset]. https://www.statista.com/statistics/674321/mobile-news-articles-time-engaged-social-media/
    Explore at:
    Dataset updated
    May 5, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2015
    Area covered
    United States
    Description

    The statistic gives information on the average time engaged with news articles on a smartphone in the United States as of September 2015, sorted by article length and the social media source. According to the source, long-form articles found on Twitter were engaged with for an average of 133 seconds.

  9. Social media as a news outlet worldwide 2025

    • statista.com
    Updated Jul 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Social media as a news outlet worldwide 2025 [Dataset]. https://www.statista.com/statistics/718019/social-media-news-source/
    Explore at:
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2025 - Feb 2025
    Area covered
    Worldwide
    Description

    During a 2025 survey, ** percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just ** percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis. Social media: trust and consumption Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than ** percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than ** percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media. What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis. Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers. Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.

  10. h

    news-sentiment-data

    • huggingface.co
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amitk17 (2024). news-sentiment-data [Dataset]. https://huggingface.co/datasets/sweatSmile/news-sentiment-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Authors
    amitk17
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    sweatSmile/news-sentiment-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. i

    Data from: COVID-19 News Articles

    • ieee-dataport.org
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Ghasiya (2022). COVID-19 News Articles [Dataset]. https://ieee-dataport.org/documents/covid-19-news-articles
    Explore at:
    Dataset updated
    May 18, 2022
    Authors
    Piyush Ghasiya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    India

  12. CT-FAN-21 corpus: A dataset for Fake News Detection

    • zenodo.org
    Updated Oct 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria StruĂź; Thomas Mandl; Gautam Kishore Shahi; Julia Maria StruĂź; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria StruĂź; Thomas Mandl; Gautam Kishore Shahi; Julia Maria StruĂź; Thomas Mandl
    Description

    Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

    Citation

    Please cite our work as

    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

    Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    Task 3a

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Task 3b

    • public_id- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • domain - domain of the given news article(applicable only for task B)

    Output data format

    Task 3a

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    Task 3b

    • public_id- Unique identifier of the news article
    • predicted_domain- predicted domain

    Sample file

    public_id, predicted_domain
    1, health
    2, crime

    Additional data for Training

    To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

    IMPORTANT!

    1. Fake news article used for task 3b is a subset of task 3a.
    2. We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

    Evaluation Metrics

    This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

    Submission Link: https://competitions.codalab.org/competitions/31238

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
  13. c

    CNN news dataset

    • crawlfeeds.com
    json, zip
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). CNN news dataset [Dataset]. https://crawlfeeds.com/datasets/cnn-news-dataset
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    This dataset contains over 27,000 news articles sourced from CNN.com, including full content, metadata, and media fields. Each article is enriched with publish dates, author information, descriptions, and full raw + cleaned content—perfect for media research, sentiment analysis, topic modeling, and natural language processing (NLP) projects.

    Last crawled in July 2021, this collection offers a historical snapshot of CNN’s reporting and editorial content.

    Use Cases:

    • News content analysis

    • Fake news detection & bias tracking

    • Topic classification and clustering

    • Training AI/NLP models

    • Historical news trend research

    • Media monitoring tools

    Update Frequency:

    Archived — no current updates, great for snapshot-based analysis

  14. Z

    Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
    Explore at:
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Schaer, Philipp
    Haak, Fabian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  15. o

    Global News Articles Dataset

    • opendatabay.com
    .undefined
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Government & Civic Records
    Description

    This dataset contains over 90,000 news articles gathered from various free news APIs, offering a valuable resource for text analysis and natural language processing tasks. It includes articles from over 600 sources across 26 countries, categorised into more than 16 topics. The dataset's primary purpose is to provide rich content for tasks such as article classification and deeper content understanding.

    Columns

    The dataset features 9 distinct columns, each providing specific details about the news articles:

    • id: A unique identifier for each news article.
    • title: The headline or title of the news article.
    • link: The URL pointing to the original news article.
    • source: The domain or website from which the article was published.
    • country: The country where the article was published.
    • topic: The category or subject of the article.
    • language: The language in which the article was published.
    • summary: A detailed description or the full content of the article.
    • published_date: The date when the article was published.

    Distribution

    The data files are typically in CSV format. The dataset comprises over 90,000 articles, with unique identifiers for each article. Approximately 36,649 unique article IDs and titles are present, alongside 35,503 unique links. Key sources include yahoo.com (15%) and indiatimes.com (7%). The main topics covered are news (67%) and finance (9%). There is one unique language value indicated. The dataset spans articles published between 26th May 2022 and 6th June 2022.

    Usage

    This dataset is ideal for a range of applications, including:

    • Natural Language Processing (NLP): Training models for text classification, entity recognition, and sentiment analysis.
    • News Aggregation and Recommendation Systems: Developing systems that categorise and suggest news content based on user preferences or trends.
    • Journalism and Media Studies: Analysing news coverage patterns, source reliability, and topic distribution across different regions.
    • Market Research: Identifying trends and insights from news related to specific industries or events.

    Coverage

    The dataset offers a global geographic scope, featuring articles from 26 different countries and over 600 sources. The primary countries represented are the United States (67%) and India (13%). The time range for the data is from 26th May 2022 to 6th June 2022. There are no specific notes on demographic availability.

    License

    CC0

    Who Can Use It

    This dataset is suitable for:

    • Data Scientists and Machine Learning Engineers: For building and testing NLP models.
    • Academic Researchers: For studies in media, communication, and computational linguistics.
    • Developers: Creating news-related applications, such as news aggregators or content analysis tools.
    • Journalists and Analysts: For conducting deep dives into news trends and public sentiment.

    Dataset Name Suggestions

    • Global News Articles Dataset
    • Daily News Corpus
    • Multilingual News Headlines
    • Current Events Data Stream
    • News Article Text Dataset

    Attributes

    Original Data Source: News Articles

  16. Fake and True News Dataset

    • figshare.com
    txt
    Updated Dec 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abu Bakkar Siddik (2020). Fake and True News Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13325198.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 3, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Abu Bakkar Siddik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this dataset have to part combined namely fake news and true news. fake news collected from Kaggle and some true news collected form IEEE Data port. Therefor some true news data required to optimize with the fake news. After that i have collect some true news from different trusted online site. Finally i have concat the Fake and True news as a single dataset for the purpose to help the Researchers further if they want to research by taken this topic.

  17. Z

    News headlines of BBC articles published by @BBCBreaking twitter account

    • data.niaid.nih.gov
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mello, Caio (2022). News headlines of BBC articles published by @BBCBreaking twitter account [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6927799
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    Lewis, Nick
    Mello, Caio
    Istif Inci, Elçin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of a list of news articles headlines retrieved from tweets published by @BBCBreaking profile in specific years (2012, 2015, 2017, 2019 and 2022).

    The dataset is in .csv format and is organised as follows:

    Columns:

    ID (tweet ID)

    created_at (tweet publication's date)

    url (url of the news article attached to the tweet)

    Titles (news headline)

    Rows: Each row contains a single news article headline sorted by date of publication (created_at). Total number of entries: 7213.

    For more details about data collection refer to Github.

  18. Data from: MN-DS: A Multilabeled News Dataset for News Articles Hierarchical...

    • zenodo.org
    csv, txt
    Updated Dec 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alina Petukhova; Alina Petukhova; Nuno Fachada; Nuno Fachada (2022). MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification [Dataset]. http://doi.org/10.5281/zenodo.7394851
    Explore at:
    txt, csvAvailable download formats
    Dataset updated
    Dec 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alina Petukhova; Alina Petukhova; Nuno Fachada; Nuno Fachada
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset contains 10,917 news articles with hierarchical news categories collected between January 1st 2019, and December 31st 2019 classified by using NewsCodes Media Topic taxonomy. We manually labelled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories.

    This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news.

    Reproducibility of results

    The results presented in the research paper "MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification", technical validation can be reproduced using functions in github repository.

    Licenses

    The dataset is made available under a CC-BY 4.0 license (see `LICENSE_DATA.txt`).

  19. o

    Al Jazeera News Articles Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Al Jazeera News Articles Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/dad5e658-b36e-48c6-b4ec-c32ca0b85501
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Entertainment & Media Consumption
    Description

    This dataset features news articles gathered from Al Jazeera through a web scraping process. It is designed for various analytical and natural language processing applications. The collection primarily covers news content from categories such as Science & Technology, Economics, and Sports. The scraping code was developed in late 2022, and users may need to update it to accommodate any changes in the Al Jazeera website's structure.

    Columns

    • category: This column contains categorical data, represented as strings, indicating the news topic or section.
    • title: This column holds the title of each news article, also as string data.
    • text: This column contains the full textual content of the article. Importantly, newline characters within the text have been specifically replaced with \ to ensure correct preservation and avoid misinterpretation when the data is saved, particularly in CSV format.

    Distribution

    The dataset is typically provided in a CSV file format. While precise total row counts are not available, the dataset includes one unique category, 1409 unique article titles, and 1413 unique article content entries, suggesting a substantial collection of distinct articles.

    Usage

    This dataset is ideal for a wide range of natural language processing (NLP) tasks, including text classification, sentiment analysis, topic modelling, and information extraction. It can be particularly valuable for training machine learning models that require real-world news content for analysis.

    Coverage

    The data consists of news articles scraped from Al Jazeera, a global news provider, indicating a global region of coverage. The articles were collected using code developed in November and December 2022. While initially focused on Science & Technology, Economics, and Sports, the provided scraping code can be adapted to collect content from additional news categories.

    License

    CC-BY-NC

    Who Can Use It

    This dataset is particularly useful for researchers, data scientists, and developers involved in natural language processing, text mining, or media content analysis. Students and academics working on projects related to news data, classification, or large language models can also benefit from this resource.

    Dataset Name Suggestions

    • Al Jazeera News Articles Dataset
    • Global News Text Corpus
    • Al Jazeera Web Scraped News
    • News Article NLP Dataset

    Attributes

    Original Data Source: Aljazeera News Dataset

  20. News Headline Sentiment Dataset

    • zenodo.org
    bin
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb (2021). News Headline Sentiment Dataset [Dataset]. http://doi.org/10.5281/zenodo.3902718
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the Monash, UEA & UCR time series regression repository. http://tseregression.org/

    The goal of this dataset is to predict sentiment score for news headline. This dataset contains 83164 time series obtained from the News Popularity in Multiple Social Media Platforms dataset from the UCI repository. This is a large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn. The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine. This data set is tailored for evaluative comparisons in predictive analytics tasks, although allowing for tasks in other research areas such as topic detection and tracking, sentiment analysis in short text, first story detection or news recommendation. The time series has 3 dimensions.

    Please refer to https://archive.ics.uci.edu/ml/datasets/News+Popularity+in+Multiple+Social+Media+Platforms for more details

    Citation request
    Nuno Moniz and Luis Torgo (2018), Multi-Source Social Feedback of Online News Feeds, CoRR

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). Opinions on whether news written by AI is good or bad in the U.S. 2023, by age group [Dataset]. https://www.statista.com/statistics/1368583/ai-use-in-news-attitudes/
Organization logo

Opinions on whether news written by AI is good or bad in the U.S. 2023, by age group

Explore at:
Dataset updated
Nov 28, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 26, 2023 - Jan 30, 2023
Area covered
United States
Description

A survey revealed that most U.S. adults believed AI-written news articles would be a bad thing, with 78 percent of all respondents saying that they felt this way, according to the results of a January 2023 survey. Younger consumers were the least likely to think this - 19 percent said they thought this would be a good thing, compared to just seven percent of their older peers aged 55 years or older.

Search
Clear search
Close search
Google apps
Main menu