100+ datasets found
  1. i

    Science and tech news dataset

    • ieee-dataport.org
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Thakur (2021). Science and tech news dataset [Dataset]. https://ieee-dataport.org/documents/science-and-tech-news-dataset
    Explore at:
    Dataset updated
    Oct 27, 2021
    Authors
    Rajat Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains world news related to Science and technology and also with the news article's available metadata.

  2. u

    Spanish Fake News Dataset

    • produccioncientifica.ucm.es
    • zenodo.org
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro; Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro (2025). Spanish Fake News Dataset [Dataset]. https://produccioncientifica.ucm.es/documentos/685699246364e456d3a66786
    Explore at:
    Dataset updated
    2025
    Authors
    Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro; Tretiakov, Arsenii; D'Antonio Maceiras, Sergio; Martín, Alejandro
    Description

    Spanish Fake News Dataset

    This dataset contains a structured and annotated collection of false news items in Spanish (Castilian), gathered and processed for academic research on misinformation.

    Dataset Scope

    The dataset represents most of the recorded false news messages and their variations up to 01.02.2021.

    Content Description

    The dataset includes samples of false information in various formats:

    News articles and headlines

    Tweets and Facebook/Instagram/Telegram posts

    YouTube video captions

    WhatsApp text and voice message transcripts

    Transcribed video/audio fragments with false claims

    Fake government documents

    Captions from photos and memes

    Text extracted from images using OCR

    Only Spanish (Castilian) texts were used, excluding regional variants (e.g., Catalan, Basque, Galician) for consistency.

    Sources

    The data was collected from the following verified fact-checking initiatives:

    Maldito Bulo

    Newtral

    AFP Factual

    Fact-checkers from these organizations provide detailed articles identifying and explaining falsehoods, often including:

    General context of the event

    Quotes or links to false claims

    Analysis and explanation of why the claims are false

    Verified information or corrections

    Collection Method

    The dataset was built using both manual extraction (e.g., identifying and quoting false statements) and automated parsing:

    MyNews service: an archive of Spanish mass media

    Custom scripts: for parsing and extracting structured data

    OCR tools: for extracting text from images (e.g., memes and screenshots)

    Fields Description

    Column Name

    Description

    Topic

    The thematic category of the news item (e.g., Politics, Health, COVID-19, Crime). Normalized and translated to English.

    Link source

    URL to the original news piece, fact-check report, or source of the claim. Invalid links were removed.

    Media

    The platform or outlet where the false claim appeared (e.g., Facebook, YouTube, WhatsApp). Normalized for consistent spelling and language.

    Date

    Publication or verification date of the news item, in YYYY-MM-DD format.

    Author

    (Optional) Author of the news or platform source, if available. May be empty.

    Headlines

    Title or summary of the news item or article containing the false information.

    Fake statement

    Quoted false claim or misinformation as cited in the verification article.

    ⚠️ Notes

    The dataset was preprocessed to remove duplicates, invalid links, and non-textual clutter.

    Field values were normalized to support multilingual and cross-platform analysis.

    Only Castilian Spanish was retained for consistency and clarity.

    📚 License & Use

    This dataset is intended for non-commercial academic and research purposes. Please cite the original fact-checking organizations and this dataset if used in publications or analysis.

  3. Z

    CT-FAN: A Multilingual dataset for Fake News Detection

    • data.niaid.nih.gov
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Wiegand (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4714516
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Gautam Kishore Shahi
    Juliane Köhler
    Thomas Mandl
    Michael Wiegand
    Melanie Siegel
    Julia Maria Struß
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}

    @article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    False - The main claim made in an article is untrue.

    Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    True - This rating indicates that the primary elements of the main claim are demonstrably true.

    Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    ID- Unique identifier of the news article

    Title- Title of the news article

    text- Text mentioned inside the news article

    our rating - class of the news article as false, partially false, true, other

    Output data format

    public_id- Unique identifier of the news article

    predicted_rating- predicted class

    Sample File

    public_id, predicted_rating 1, false 2, true

    IMPORTANT!

    We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

    G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

    Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104

    Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.

  4. h

    cc_news

    • huggingface.co
    Updated Jul 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vladimir Blagojevic (2018). cc_news [Dataset]. https://huggingface.co/datasets/vblagoje/cc_news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2018
    Authors
    Vladimir Blagojevic
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for CC-News

      Dataset Summary
    

    CC-News dataset contains news articles from news sites all over the world. The data is available on AWS S3 in the Common Crawl bucket at /crawl-data/CC-NEWS/. This version of the dataset has been prepared using news-please - an integrated web crawler and information extractor for news.It contains 708241 English language news articles published between Jan 2017 and December 2019. It represents a small portion of the English… See the full description on the dataset page: https://huggingface.co/datasets/vblagoje/cc_news.

  5. h

    all-the-news

    • huggingface.co
    Updated Aug 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TabMaven (2019). all-the-news [Dataset]. https://huggingface.co/datasets/TabMaven/all-the-news
    Explore at:
    Dataset updated
    Aug 17, 2019
    Dataset authored and provided by
    TabMaven
    Description

    TabMaven/all-the-news dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    All-Daily-News

    • huggingface.co
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Papers With Backtest (2024). All-Daily-News [Dataset]. https://huggingface.co/datasets/paperswithbacktest/All-Daily-News
    Explore at:
    Dataset updated
    Sep 3, 2024
    Dataset authored and provided by
    Papers With Backtest
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Information

    This dataset includes news data for various instruments.

      Instruments Included
    

    Stocks, ETFs, Forex, Cryptocurrencies, Commodities and more.

      Dataset Columns
    

    symbols: The symbols in the news, typically representing stock tickers or other financial instruments mentioned in the article. datetime: The date and time when the news article was published, formatted as a string. title: The title of the news article, providing a brief and descriptive… See the full description on the dataset page: https://huggingface.co/datasets/paperswithbacktest/All-Daily-News.

  7. All The News text

    • kaggle.com
    Updated Mar 29, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexey Voytsekhovskiy (2020). All The News text [Dataset]. https://www.kaggle.com/datasets/alexvoy/all-the-news-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexey Voytsekhovskiy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Alexey Voytsekhovskiy

    Released under CC0: Public Domain

    Contents

  8. i

    Covid-19 and vaccine news dataset

    • ieee-dataport.org
    Updated Oct 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Thakur (2021). Covid-19 and vaccine news dataset [Dataset]. https://ieee-dataport.org/documents/covid-19-and-vaccine-news-dataset
    Explore at:
    Dataset updated
    Oct 27, 2021
    Authors
    Rajat Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains world news related to Covid-19 and vaccine and also with the news article's available metadata.

  9. h

    hausa_voa_topics

    • huggingface.co
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LSV @ Saarland University (2025). hausa_voa_topics [Dataset]. https://huggingface.co/datasets/UdS-LSV/hausa_voa_topics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 25, 2025
    Dataset authored and provided by
    LSV @ Saarland University
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for Hausa VOA News Topic Classification dataset (hausa_voa_topics)

      Dataset Summary
    

    A news headline topic classification dataset, similar to AG-news, for Hausa. The news headlines were collected from VOA Hausa.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    Hausa (ISO 639-1: ha)

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    An instance consists of a news title sentence and the corresponding topic label.… See the full description on the dataset page: https://huggingface.co/datasets/UdS-LSV/hausa_voa_topics.

  10. w

    Dataset - Male in the news

    • workwithdata.com
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset - Male in the news [Dataset]. https://www.workwithdata.com/news?pk=Male
    Explore at:
    Dataset updated
    Jun 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset - Male in the news

  11. Data from: News sentiment

    • kaggle.com
    Updated Mar 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaushik Soni (2021). News sentiment [Dataset]. https://kaggle.com/kaushiksoni10/news-sentiment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kaushik Soni
    Description

    Content

    This data contains news titles and headlines from different sources on different topics. The description of the columns is following;

    | Column | DataType | Description | | --- | --- | | IDLink | str | Unique identifier of the row | | Title | str | Title of the news | | Headline | str | Headline of the news | | Source | str | Newspaper/news-source | | Topic | str | News-topic (values : obama, economy, microsoft, palestine) | | PublishDate | Timestamp | publish date & time | | Facebook | int | facebook rating | | GooglePlus | int | google plus rating | | LinkedIn | int | linkedin rating |

    Inspiration

    One of the main task that can be performed with this dataset is to perform Setiment Analysis. Find the Sentiment scores for each title and headline of the test data applying Regression Analysis.

  12. Checking the news on weekdays in the U.S. 2018, by daypart

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Checking the news on weekdays in the U.S. 2018, by daypart [Dataset]. https://www.statista.com/statistics/816469/check-news-typical-weekday-us-by-daypart/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 17, 2018 - Jan 23, 2018
    Area covered
    United States
    Description

    This graph displays the time of day when consumers check the news on a typical weekday in the United States as of ************. During the survey, it was found that ** percent of consumers check the news in the early morning of a typical weekday.

  13. Media Coding Dataset for News Content Analysis

    • zenodo.org
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stavros Doropoulos; Stavros Doropoulos; Elisavet Karapalidou; Polychronis Charitidis; Polychronis Charitidis; Sophia Karakeva; Sophia Karakeva; Stavros Vologiannidis; Stavros Vologiannidis; Elisavet Karapalidou (2025). Media Coding Dataset for News Content Analysis [Dataset]. http://doi.org/10.5281/zenodo.15767938
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stavros Doropoulos; Stavros Doropoulos; Elisavet Karapalidou; Polychronis Charitidis; Polychronis Charitidis; Sophia Karakeva; Sophia Karakeva; Stavros Vologiannidis; Stavros Vologiannidis; Elisavet Karapalidou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the study Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis.

    It provides a reproducible benchmark for evaluating automated content analysis methods against human-annotated ground truth.

    The dataset includes:

    • articles.csv
      Contains the 200 news articles collected for this study, each with:

      • id: unique identifier

      • url: source URL of the original article

      • content: full text of the news article

    • codebook.json
      A structured JSON file defining the 26-question analysis codebook used for annotation.
      Each question entry specifies:

      • questionId: question ID (e.g., Q1)

      • prompt: annotation question text

      • questionAnswerType: type (SINGLE_CHOICE or MULTI_CHOICE)

      • eligibleQuestionAnswers: list of possible tags / codes

    • annotations.json
      Contains the complete human annotation data.
      For each article id, it provides the list of responses to all 26 codebook questions as determined by an expert annotator, establishing the ground truth labels.

    Intended use

    • Designed for research popuses including natural language understanding, content classification, and LLM evaluation.

    • Please request access with your academic email.
  14. w

    Websites using News

    • webtechsurvey.com
    csv
    Updated Apr 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2024). Websites using News [Dataset]. https://webtechsurvey.com/technology/news
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 22, 2024
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the News technology, compiled through global website indexing conducted by WebTechSurvey.

  15. Leading news websites in the U.S. 2025, by monthly visits

    • tokrwards.com
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2024
    Area covered
    United States
    Description

    In April 2025, the news website with the most monthly visits in the United States was nytimes.com, with a total of ***** million monthly visits in that month. In second place was cnn.com with just over *** million visits, followed by foxnews.com with almost a ****** of a million. Online news consumption in the U.S. Americans get their news in a variety of ways, but social media is an increasingly popular option. A survey on social media news consumption revealed that ** percent of Twitter users regularly used the site for news, and Facebook and Reddit were also popular for news among their users. Interestingly though, social media is the least trusted news sources in the United States. News and trust Trust in news sources has become increasingly important to the American news consumer amidst the spread of fake news, and the public are more vocal about whether or not they have faith in a source to report news correctly. Ongoing discussions about the credibility, accuracy and bias of news networks, anchors, TV show hosts, and news media professionals mean that those looking to keep up to date tend to be more cautious than ever before. In general, news audiences are skeptical. In 2020, just **** percent of respondents to a survey investigating the perceived objectivity of the mass media reported having a great deal of trust in the media to report news fully, accurately, and fairly.

  16. t

    Tweets – PAP News Dataset - Dataset - LDM

    • service.tib.eu
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Tweets – PAP News Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/tweets---pap-news-dataset
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    New annotated datasets linking tweets and articles, including Tweets – PAP News Dataset, Tweets – BBC News Dataset, Cascades – PAP News Dataset, and Cascades – BBC News Dataset.

  17. I

    Global News Index and Extracted Features Repository (v.1.2.0)

    • databank.illinois.edu
    Updated Mar 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global News Index and Extracted Features Repository (v.1.2.0) [Dataset]. http://doi.org/10.13012/B2IDB-5649852_V5
    Explore at:
    Dataset updated
    Mar 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the Archer Access Request Form so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the Archer User Feedback Form. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to subscribe to it. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [codebook], v1.2.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V5 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2023. Global News Index and Extracted Features Repository [database], v1.2.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V5 *NOTE: V4 is suppressed and V5 is replacing V4 with updated ‘Archer’ documents.

  18. BBC Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Nov 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2017). BBC Datasets [Dataset]. https://brightdata.com/products/datasets/bbc
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Nov 8, 2017
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of BBC broadcast data with our comprehensive dataset featuring transcripts, program schedules, headlines, topics, and multimedia resources. This all-in-one dataset is designed to empower media analysts, researchers, journalists, and advocacy groups with actionable insights for media analysis, transparency studies, and editorial assessments.

    Dataset Features

    Transcripts: Access detailed broadcast transcripts, including headlines, content, author details, and publication dates. Perfect for analyzing media framing, topic frequency, and news narratives across various programs. Program Schedules: Explore program schedules with accurate timing, show names, and related metadata to track news coverage patterns and identify trends. Topics and Keywords: Analyze categorized topics and keywords to understand content diversity, editorial focus, and recurring themes in news broadcasts. Multimedia Content: Gain access to videos, images, and related articles linked to each broadcast for a holistic understanding of the news presentation. Metadata: Includes critical data points like publication dates, last updates, content URLs, and unique IDs for easier referencing and cross-analysis.

    Customizable Subsets for Specific Needs Our CNN dataset is fully customizable to match your research or analytical goals. Focus on transcripts for in-depth media framing analysis, extract multimedia for content visualization studies, or dive into program schedules for broadcast trend analysis. Tailor the dataset to ensure it aligns with your objectives for maximum efficiency and relevance.

    Popular Use Cases

    Media Analysis: Evaluate news framing, content diversity, and topic coverage to assess editorial direction and media focus. Transparency Studies: Analyze journalistic standards, corrections, and retractions to assess media integrity and accountability. Audience Engagement: Identify recurring topics and trends in news content to understand audience preferences and behavior. Market Analysis: Track media coverage of key industries, companies, and topics to analyze public sentiment and industry relevance. Journalistic Integrity: Use transcripts and metadata to evaluate adherence to reporting practices, fairness, and transparency in news coverage. Research and Scholarly Studies: Leverage transcripts and multimedia to support academic studies in journalism, media criticism, and political discourse analysis.

    Whether you are evaluating transparency, conducting media criticism, or tracking broadcast trends, our BBC dataset provides you with the tools and insights needed for in-depth research and strategic analysis. Customize your access to focus on the most relevant data points for your unique needs.

  19. h

    NEWS-COPY-eval

    • huggingface.co
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenghao Mou (2024). NEWS-COPY-eval [Dataset]. https://huggingface.co/datasets/chenghao/NEWS-COPY-eval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2024
    Authors
    Chenghao Mou
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    NEWS COPY

    This dataset contains the evaluation and test sets for the NEWS COPY dataset. Original source can be found at Github. The license is unclear. It contains the following data:

    Historical Newspapers

    Training datasets can be found at chenghao/NEWS-COPY-train.

      Citation
    

    @inproceedings{silcock-etal-2020-noise, title = "Noise-Robust De-Duplication at Scale", author = "Silcock, Emily and D'Amico-Wong, Luca and Yang, Jinglin and Dell, Melissa", booktitle =… See the full description on the dataset page: https://huggingface.co/datasets/chenghao/NEWS-COPY-eval.

  20. H

    Replication Data for: How the News Media Activates Public Expression and...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 13, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary King; Benjamin Schneer (2017). Replication Data for: How the News Media Activates Public Expression and Influences National Agendas [Dataset]. http://doi.org/10.7910/DVN/1EMHTK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Gary King; Benjamin Schneer
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/1EMHTKhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/1EMHTK

    Description

    We demonstrate that the news media causes Americans to take public stands on issues, join national policy conversations, and express themselves publicly more often than they would otherwise --- all key components of democratic politics. We recruited 48 mostly small media outlets that allowed us to choose groups of outlets to write and publish articles, on subjects we approved, and dates we randomly assigned. We estimate the causal effect on proximal measures, such as website pageviews and Twitter discussion of the articles' specific subjects, and distal ones, such as national Twitter conversation in broad policy areas. Our intervention increased discussion in each broad policy area by $\approx$ 62.7% (relative to a day's volume), accounting for 13,166 additional posts, with similar effects across population subgroups.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rajat Thakur (2021). Science and tech news dataset [Dataset]. https://ieee-dataport.org/documents/science-and-tech-news-dataset

Science and tech news dataset

Explore at:
21 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 27, 2021
Authors
Rajat Thakur
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains world news related to Science and technology and also with the news article's available metadata.

Search
Clear search
Close search
Google apps
Main menu