100+ datasets found
  1. Global News Engagement on Social Media

    • kaggle.com
    zip
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanchana1990 (2024). Global News Engagement on Social Media [Dataset]. https://www.kaggle.com/datasets/kanchana1990/global-news-engagement-on-social-media
    Explore at:
    zip(267156 bytes)Available download formats
    Dataset updated
    Mar 15, 2024
    Authors
    Kanchana1990
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This comprehensive dataset offers a deep dive into the social media engagement metrics of nearly 4,000 posts from four of the world's leading news channels: CNN, BBC, Al Jazeera, and Reuters. Curated to provide a holistic view of global news interaction on social media, the collection stands out for its meticulous assembly and broad spectrum of content.

    Dataset Overview: Spanning various global events, topics, and narratives, this dataset is a snapshot of how news is consumed and interacted with on social media platforms. It serves as a rich resource for analyzing trends, engagement patterns, and the dissemination of information across international borders.

    Data Science Applications: Ideal for researchers and enthusiasts in the fields of data science, media studies, and social analytics, this dataset opens doors to numerous explorations such as engagement analysis, trend forecasting, content strategy optimization, and the study of information flow in digital spaces. It also holds potential for machine learning projects aiming to predict engagement or classify content based on interaction metrics.

    Column Descriptors: Each record in the dataset is detailed with the following columns: - text: The title or main content of the post. - likes: The number of likes each post has garnered. - comments: The number of comments left by viewers. - shares: How many times the post has been shared.

    Ethically Mined Data: The collection of this dataset was conducted with the highest ethical standards in mind, ensuring compliance with data privacy laws and platform policies. By anonymizing data where necessary and focusing solely on publicly available information, it respects both individual privacy and intellectual property rights.

    Special thanks are extended to the Facebook platform and the respective news channels for their openness and the rich public data they provide. This dataset not only celebrates the vibrant exchange on social media but also underscores the importance of responsible data use and sharing in fostering understanding and innovation.

  2. Social media as a news outlet worldwide 2024

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Watson, Social media as a news outlet worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Amy Watson
    Description

    During a 2024 survey, 77 percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just 23 percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis.

                  Social media: trust and consumption
    
                  Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than 35 percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than 50 percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media.
    
                  What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis.
                  Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers.
                  Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.
    
  3. Same News - Different Sources

    • kaggle.com
    zip
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Same News - Different Sources [Dataset]. https://www.kaggle.com/datasets/thedevastator/same-news-different-sources
    Explore at:
    zip(262582 bytes)Available download formats
    Dataset updated
    Oct 28, 2022
    Authors
    The Devastator
    Description

    Same News Different Sources

    How different sources report on the same events

    About this dataset

    Do you ever feel like you're being inundated with news from all sides, and you can't keep up? Well, you're not alone. In today's age of social media and 24-hour news cycles, it can be difficult to know what's going on in the world. And with so many different news sources to choose from, it can be hard to know who to trust.

    That's where this dataset comes in. It captures data related to individuals' Sentiment Analysis toward different news sources. The data was collected by administering a survey to individuals who use different news sources. The survey responses were then analyzed to obtain the sentiment score for each news source.

    So if you're feeling overwhelmed by the news, don't worry – this dataset has you covered. With its insights on which news sources are trustworthy and which ones aren't, you'll be able to make informed decisions about what to read – and what to skip

    How to use the dataset

    The Twitter Sentiment Analysis dataset can be used to analyze the impact of social media on news consumption. This data can be used to study how individuals' sentiments towards different news sources vary based on the source they use. The dataset can also be used to study how different factors, such as the time of day or the topic of the news, affect an individual's sentiments

    Research Ideas

    • Identify which news sources are most trusted by the public.
    • Understand what topics are most important to the public.
    • Understand how different news sources report on the same issue

    Columns

    File: news.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: news_api.csv | Column name | Description | |:--------------|:------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Source | The news source the article is from. (String) |

    File: politics.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: sports.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: television.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: trending.csv | Column name | Description ...

  4. News Popularity in Multiple Social Media Platforms

    • kaggle.com
    zip
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil John (2020). News Popularity in Multiple Social Media Platforms [Dataset]. https://www.kaggle.com/nikhiljohnk/news-popularity-in-multiple-social-media-platforms
    Explore at:
    zip(10881978 bytes)Available download formats
    Dataset updated
    Oct 28, 2020
    Authors
    Nikhil John
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Social Media has been taking up everything on the Internet. People getting the latest news, useful resources, life partner and what not. In a world where Social media plays a big role in giving news, we must also know that news which affects our sentiments are going to get spread like a wildfire. Based on the Headline and the title, and according to the date given and the Social media platforms, you have to predict how it has affected the human sentiment scores. You have to predict the column “SentimentTitle” and “SentimentHeadline”.

    Content

    This is a subset of the dataset of the same name available in the UCI Machine Learning Repository The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine.

    Dataset Information

    The attributes for each of the dataset are : - IDLink (numeric): Unique identifier of news items - Title (string): Title of the news item according to the official media sources - Headline (string): Headline of the news item according to the official media sources - Source (string): Original news outlet that published the news item - Topic (string): Query topic used to obtain the items in the official media sources - Publish-Date (timestamp): Date and time of the news items' publication - Facebook (numeric): Final value of the news items' popularity according to the social media source Facebook - Google-Plus (numeric): Final value of the news items' popularity according to the social media source Google+ - LinkedIn (numeric): Final value of the news items' popularity according to the social media source LinkedIn - SentimentTitle: Sentiment score of the title, Higher the score, better is the impact or +ve sentiment and vice-versa. (Target Variable 1) - SentimentHeadline: Sentiment score of the text in the news items' headline. Higher the score, better is the impact or +ve sentiment. (Target Variable 2)

  5. Number of global social network users 2017-2028

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How many people use social media?

                  Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
    
                  Who uses social media?
                  Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
                  when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
    
                  How much time do people spend on social media?
                  Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
    
                  What are the most popular social media platforms?
                  Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
    
  6. Social Media Political Content Analysis Dataset

    • kaggle.com
    zip
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Hameed (2024). Social Media Political Content Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/fysalhameed/impact-of-social-media-on-political-consent
    Explore at:
    zip(355107 bytes)Available download formats
    Dataset updated
    May 13, 2024
    Authors
    Faisal Hameed
    Description

    This dataset contains simulated data for social media users' demographics, behaviors, and perceptions related to political content. It includes features such as age, gender, education level, occupation, social media usage frequency, exposure to political content, and perceptions of accuracy and relevance.

    the features included in the "Social Media Political Content Analysis Dataset":

    1. Age: Age of the user.
    2. Gender: Gender identity of the user.
    3. Education Level: Highest level of education attained by the user.
    4. Occupation: Current occupation of the user.
    5. Political Affiliation: Political leaning or affiliation of the user (e.g., Liberal, Conservative, Independent).
    6. Geographic Location: Country or region where the user is located (e.g., USA, UK, Canada, Australia).
    7. Social Media Usage Frequency: Frequency of social media usage by the user (e.g., 0-1 hour, 1-2 hours, 2-4 hours, 4+ hours).
    8. Preferred Social Media: Social media platform preferred by the user (e.g., Facebook, Twitter, Instagram).
    9. Political Content Exposure: Frequency of exposure to political content on social media (e.g., Once a day, Few times a week, Rarely, Several times a day).
    10. Types of Political Content: Types of political content consumed by the user (e.g., News articles, Opinion pieces, Memes).
    11. Sources of Political Content: Sources from which the user obtains political content (e.g., Mainstream media, Political parties, Independent bloggers).
    12. Recency of Exposure: Recency of the user's exposure to political content (e.g., Within the last hour, Within the last 24 hours, Within the last week, Longer than a week ago).
    13. Interactions Frequency: Frequency of user interactions with political content on social media (e.g., Once a day, Few times a week, Rarely, Several times a day).
    14. Political Content Topics: Topics of political content that interest the user (e.g., Economy, Healthcare, Immigration, Environment).
    15. Perception of Accuracy: User's perception of the accuracy of political content on social media (e.g., Very accurate, Somewhat accurate, Not accurate).
    16. Awareness of Algorithms: Whether the user is aware of algorithms that determine their social media feed (e.g., Yes, No).
    17. Perception of Relevance: User's perception of the relevance of political content on social media (e.g., Very relevant, Somewhat relevant, Not relevant).
    18. Personal Impact: User's perception of the personal impact of political content on social media (e.g., Strong impact, Moderate impact, No impact).
    19. Trust in Social Media: User's level of trust in social media as a source of political information (e.g., Trust a lot, Trust somewhat, Do not trust).
    20. Concerns about Algorithms: User's level of concern about algorithms shaping their social media experience (e.g., Very concerned, Somewhat concerned, Not concerned).
    21. Overall Quality of Discourse: User's perception of the overall quality of political discourse on social media (e.g., High quality, Moderate quality, Low quality).
    22. Views on Influence: User's perception of the influence of political content on social media (e.g., Very influential, Somewhat influential, Not influential).
    23. Suggestions for Improvement: User's suggestions for improving the quality or experience of political content on social media (e.g., Increase transparency, Provide more diverse sources, Improve fact-checking, Enhance user controls).
  7. U.S. Facebook data requests from government agencies 2013-2023

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, U.S. Facebook data requests from government agencies 2013-2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Facebook received 73,390 user data requests from federal agencies and courts in the United States during the second half of 2023. The social network produced some user data in 88.84 percent of requests from U.S. federal authorities. The United States accounts for the largest share of Facebook user data requests worldwide.

  8. Average daily time spent on social media worldwide 2012-2024

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How much time do people spend on social media?

                  As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
                  the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
                  People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
                  During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
    
  9. Source based Fake News Classification

    • kaggle.com
    zip
    Updated Mar 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Patel (2023). Source based Fake News Classification [Dataset]. https://www.kaggle.com/datasets/yash0956/fakenews
    Explore at:
    zip(3166955 bytes)Available download formats
    Dataset updated
    Mar 19, 2023
    Authors
    Yash Patel
    Description

    Context

    Social media is a vast pool of content, and among all the content available for users to access, news is an element that is accessed most frequently. These news can be posted by politicians, news channels, newspaper websites, or even common civilians. These posts have to be checked for their authenticity, since spreading misinformation has been a real concern in today’s times, and many firms are taking steps to make the common people aware of the consequences of spread misinformation. The measure of authenticity of the news posted online cannot be definitively measured, since the manual classification of news is tedious and time-consuming, and is also subject to bias.

    Content

    Data preprocessing has been done on the dataset Getting Real about Fake News and skew has been eliminated.

  10. CT-FAN-21 corpus: A dataset for Fake News Detection

    • zenodo.org
    Updated Oct 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
    Description

    Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

    Citation

    Please cite our work as

    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

    Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    Task 3a

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Task 3b

    • public_id- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • domain - domain of the given news article(applicable only for task B)

    Output data format

    Task 3a

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    Task 3b

    • public_id- Unique identifier of the news article
    • predicted_domain- predicted domain

    Sample file

    public_id, predicted_domain
    1, health
    2, crime

    Additional data for Training

    To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

    IMPORTANT!

    1. Fake news article used for task 3b is a subset of task 3a.
    2. We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

    Evaluation Metrics

    This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

    Submission Link: https://competitions.codalab.org/competitions/31238

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
  11. d

    Data from: Supersharers of fake news on Twitter

    • dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahar Baribi-Bartov; Briony Swire-Thompson; Nir Grinberg (2025). Supersharers of fake news on Twitter [Dataset]. http://doi.org/10.5061/dryad.44j0zpcmq
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Sahar Baribi-Bartov; Briony Swire-Thompson; Nir Grinberg
    Time period covered
    Jan 1, 2024
    Description

    Governments may have the capacity to flood social media with fake news, but little is known about the use of flooding by ordinary voters. In this work, we identify 2107 registered US voters that account for 80% of the fake news shared on Twitter during the 2020 US presidential election by an entire panel of 664,391 voters. We find that supersharers are important members of the network, reaching a sizable 5.2% of registered voters on the platform. Supersharers have a significant overrepresentation of women, older adults, and registered Republicans. Supersharers' massive volume does not seem automated but is rather generated through manual and persistent retweeting. These findings highlight a vulnerability of social media for democracy, where a small group of people distort the political reality for many., This dataset contains aggregated information necessary to replicate the results reported in our work on Supersharers of Fake News on Twitter while respecting and preserving the privacy expectations of individuals included in the analysis. No individual-level data is provided as part of this dataset. The data collection process that enabled the creation of this dataset leveraged a large-scale panel of registered U.S. voters matched to Twitter accounts. We examined the activity of 664,391 panel members who were active on Twitter during the months of the 2020 U.S. presidential election (August to November 2020, inclusive), and identified a subset of 2,107 supersharers, which are the most prolific sharers of fake news in the panel that together account for 80% of fake news content shared on the platform. We rely on a source-level definition of fake news, that uses the manually-labeled list of fake news sites by Grinberg et al. 2019 and an updated list based on NewsGuard ratings (commercial..., , # Supersharers of Fake News on Twitter

    This repository contains data and code for replication of the results presented in the paper.

    The folders are mostly organized by research questions as detailed below. Each folder contains the code and publicly available data necessary for the replication of results. Importantly, no individual-level data is provided as part of this repository. De-identified individual-level data can be attained for IRB-approved uses under the terms and conditions specified in the paper. Once access is granted, the restricted-access data is expected to be located under ./restricted_data.

    The folders in this repository are the following:

    Preprocessing

    Code under the preprocessing folder contains the following:

    1. source classifier - the code used to train a classifier based on NewsGuard domain flags to match the fake news labels source definition use in Grinberg et el. 2019 labels.
    2. political classifier - the code used to identify political tweets, i...
  12. Z

    CT-FAN: A Multilingual dataset for Fake News Detection

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4714516
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    University of Applied Sciences Potsdam
    University of Hildesheim
    Darmstadt University of Applied Sciences
    University of Duisburg-Essen
    University of Klagenfurt
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}

    @article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    False - The main claim made in an article is untrue.

    Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    True - This rating indicates that the primary elements of the main claim are demonstrably true.

    Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    ID- Unique identifier of the news article

    Title- Title of the news article

    text- Text mentioned inside the news article

    our rating - class of the news article as false, partially false, true, other

    Output data format

    public_id- Unique identifier of the news article

    predicted_rating- predicted class

    Sample File

    public_id, predicted_rating 1, false 2, true

    IMPORTANT!

    We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

    G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

    Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104

    Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

    Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.

  13. f

    Data from: Factors influencing young people’s news consumption in...

    • tandf.figshare.com
    xlsx
    Updated Feb 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadine Klopfenstein; Valery Wyss; Wibke Weber (2024). Factors influencing young people’s news consumption in Switzerland during normative transitions: A mixed methods study [Dataset]. http://doi.org/10.6084/m9.figshare.24711405.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 4, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Nadine Klopfenstein; Valery Wyss; Wibke Weber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Switzerland
    Description

    Several media studies have investigated the news consumption of young people and discussed where they get information and what motivates them to consume news. Little is known about the structural factors that influence young people’s news consumption behavior. The aim of this paper is to fill this research gap by focusing on structural factors that play a major role in young people’s news consumption. In a mixed-methods study, we investigated Swiss youth media behavior in news consumption from 2019 to 2020 in Switzerland. The results show that news consumption of people aged 12–20 is determined by three structural factors at home and outside: 1. access to media and internet; 2. regulation by parents and teachers, and 3. routines at home or school. These three factors shape the individual media environment and are related to young people’s news consumption behavior. Changes in news consumption behavior were evident in school transitions where young people not only change teachers and get a new peer group but are often involved in a change of location. These changes can be normative transitions which have an influence on the structural factors of the individual media environment and thus influence the news consumption behavior of young people. Young Swiss people consume news via their smartphones, which are offered to them through news portals, various apps, or via social media feeds, on which they usually come across news by chance and consume it casually in their free time. Structural factors of media environments (i.e., access, regulation, and news consumption routines) play a major role in young people’s news consumption. These structural factors can be influenced by parents, teachers, and peers. For schools in particular, the paradigm that emerges from these findings is to reduce barriers to accessing news content and to rethink certain regulations, and to make recommendations and establish routines that encourage young people to consume news.

  14. E

    Data from: A Data set for Information Spreading over the News

    • live.european-language-grid.eu
    txt
    Updated Nov 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). A Data set for Information Spreading over the News [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7719
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 28, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.Introduction:Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as asHow far does the information in the form of news reach out to the public?Does the content of news remain the same or changes to a certain extent?Do the cultural values impact the information especially when the same news will get translated in other languages?Statistics about datasets:

    Statistics about datasets:

    --------------------------------------------------------------------------------------------------------------------------------------

    # Domain Event Type Articles Per Language Total Articles

    1 Sports FIFA World Cup 983-en, 762-sp, 711-de, 10-sl, 216-pt 2679

    2 Natural Disaster Earthquake 941-en, 999-sp, 937-de, 19-sl, 251-pt 3194

    3 Climate Changes Global Warming 996-en, 298-sp, 545-de, 8-sl, 97-pt 1945

    --------------------------------------------------------------------------------------------------------------------------------------

  15. Fake News data set

    • kaggle.com
    zip
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn-Jostein (2021). Fake News data set [Dataset]. https://www.kaggle.com/datasets/bjoernjostein/fake-news-data-set
    Explore at:
    zip(56446259 bytes)Available download formats
    Dataset updated
    Dec 17, 2021
    Authors
    Bjørn-Jostein
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Today, we are producing more information than ever before, but not all information is true. Some of it is actually malicious and harmful. And it makes it harder for us to trust any piece of information we come across! Not only that, now the bad actors are able to use language modelling tools like Open AI's GPT 2 to generate fake news too. Ever since its initial release, there have been talks on how it can be potentially misused for generating misleading news articles, automating the production of abusive or fake content for social media, and automating the creation of spam and phishing content.

    How do we figure out what is true and what is fake? Can we do something about it?

    Content

    The dataset consists of around 387,000 pieces of text which has been sourced from various news articles on the web as well as texts generated by Open AI's GPT 2 language model!

    The dataset is split into train, validation and test such that each of the sets has an equal split of the two classes.

    Acknowledgements

    This dataset was published on AI Crowd in a so-called KIIT AI (mini)Blitz⚡ Challenge. AI Blitz⚡ is a series of educational challenges by AIcrowd, with an aim to make it really easy for anyone to get started with the world of AI. This AI Blitz⚡challenge was an exclusive challenge just for the students and the faculty of the Kalinga Institute of Industrial Technology.

  16. H

    Replication Data for: How the News Media Activates Public Expression and...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 13, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary King; Benjamin Schneer (2017). Replication Data for: How the News Media Activates Public Expression and Influences National Agendas [Dataset]. http://doi.org/10.7910/DVN/1EMHTK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Gary King; Benjamin Schneer
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/1EMHTKhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/1EMHTK

    Description

    We demonstrate that the news media causes Americans to take public stands on issues, join national policy conversations, and express themselves publicly more often than they would otherwise --- all key components of democratic politics. We recruited 48 mostly small media outlets that allowed us to choose groups of outlets to write and publish articles, on subjects we approved, and dates we randomly assigned. We estimate the causal effect on proximal measures, such as website pageviews and Twitter discussion of the articles' specific subjects, and distal ones, such as national Twitter conversation in broad policy areas. Our intervention increased discussion in each broad policy area by $\approx$ 62.7% (relative to a day's volume), accounting for 13,166 additional posts, with similar effects across population subgroups.

  17. Cross-Lingual Dataset of Crisis-Related Social Media

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fedor Vitiugin; Fedor Vitiugin; Carlos Castillo; Carlos Castillo (2023). Cross-Lingual Dataset of Crisis-Related Social Media [Dataset]. http://doi.org/10.5281/zenodo.8393148
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fedor Vitiugin; Fedor Vitiugin; Carlos Castillo; Carlos Castillo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The cross-lingual natural disaster dataset includes public tweets collected using Twitter’s public API, filtering by location-related keywords and date, without using any additional filtering (e.g., we did not restrict the query to specific languages). We considered two disaster events and two long-term natural disasters across Europe (floods and wildfires) that received substantial news coverage internationally.

    Three of the top languages were common to all the studied events: English (ISO 639-1 code: en), Spanish (es), and French (fr). Additionally, we found hundreds of messages for each event in other five languages, including Arabic (ar), German (de), Japanese (ja), Indonesian (id), Italian (it) and Portuguese (pt).

    After collecting the data, we labelled tweets that contained potentially informative factual information. We name this group of tweets “informative messages.” Next, we used crowdsourcing to further categorize the messages into various informational categories. We asked three different workers to label each informative messages across languages. The target categories were based on an ontology from TREC-IS 2018, where we grouped some low level ontology categories into higher-level ones.

  18. Fake News Detection Data

    • kaggle.com
    zip
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tasnim Niger (2024). Fake News Detection Data [Dataset]. https://www.kaggle.com/datasets/tasnimniger/fake-news-detection-data
    Explore at:
    zip(55829 bytes)Available download formats
    Dataset updated
    Apr 27, 2024
    Authors
    Tasnim Niger
    Description

    The internet and social media have led to a major problem—fake news. Fake news is false information presented as real news, often with the goal of tricking or influencing people. It's difficult to identify fake news because it can look very similar to real news. The Fake News detection dataset deals with the problem indirectly by using tabular summary statistics about each news article to attempt to predict whether the article is real or fake. This dataset is in a tabular format and contains features such as word count, sentence length, unique words, average word length, and a label indicating whether the article is fake or real.

  19. Facebook users worldwide 2017-2027

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  20. H

    Replication Data for: Distorsions of Political Bias in Crowdsourced...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michele Coscia; Luca Rossi (2020). Replication Data for: Distorsions of Political Bias in Crowdsourced Misinformation Flagging [Dataset]. http://doi.org/10.7910/DVN/Y3CP79
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 13, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Michele Coscia; Luca Rossi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Many people consume news on social media, yet the production of news items online has come under crossfire due to the common spreading of misinformation. Social media platforms police their content in various ways. Primarily they rely on crowdsourced “flags”: users signal to the platform that a specific news item might be misleading and, if they raise enough of them, the item will be fact-checked. However, real-world data show that the most flagged news sources are also the most popular and – supposedly – reliable ones. In this paper, we show this phenomenon can be explained by the unreasonable assumptions current content policing strategies make about how the online social media environment is shaped. The most realistic assumption is that confirmation bias will prevent a user from flagging a news item if they share the same political bias as the news source producing it. We show, via agent-based simulations, that a model reproducing our current understanding of the social media environment will necessarily result in the most neutral and accurate sources receiving most flags.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kanchana1990 (2024). Global News Engagement on Social Media [Dataset]. https://www.kaggle.com/datasets/kanchana1990/global-news-engagement-on-social-media
Organization logo

Global News Engagement on Social Media

Insights from CNN, BBC, Al Jazeera, Reuters

Explore at:
zip(267156 bytes)Available download formats
Dataset updated
Mar 15, 2024
Authors
Kanchana1990
License

Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically

Description

This comprehensive dataset offers a deep dive into the social media engagement metrics of nearly 4,000 posts from four of the world's leading news channels: CNN, BBC, Al Jazeera, and Reuters. Curated to provide a holistic view of global news interaction on social media, the collection stands out for its meticulous assembly and broad spectrum of content.

Dataset Overview: Spanning various global events, topics, and narratives, this dataset is a snapshot of how news is consumed and interacted with on social media platforms. It serves as a rich resource for analyzing trends, engagement patterns, and the dissemination of information across international borders.

Data Science Applications: Ideal for researchers and enthusiasts in the fields of data science, media studies, and social analytics, this dataset opens doors to numerous explorations such as engagement analysis, trend forecasting, content strategy optimization, and the study of information flow in digital spaces. It also holds potential for machine learning projects aiming to predict engagement or classify content based on interaction metrics.

Column Descriptors: Each record in the dataset is detailed with the following columns: - text: The title or main content of the post. - likes: The number of likes each post has garnered. - comments: The number of comments left by viewers. - shares: How many times the post has been shared.

Ethically Mined Data: The collection of this dataset was conducted with the highest ethical standards in mind, ensuring compliance with data privacy laws and platform policies. By anonymizing data where necessary and focusing solely on publicly available information, it respects both individual privacy and intellectual property rights.

Special thanks are extended to the Facebook platform and the respective news channels for their openness and the rich public data they provide. This dataset not only celebrates the vibrant exchange on social media but also underscores the importance of responsible data use and sharing in fostering understanding and innovation.

Search
Clear search
Close search
Google apps
Main menu