Do you ever feel like you're being inundated with news from all sides, and you can't keep up? Well, you're not alone. In today's age of social media and 24-hour news cycles, it can be difficult to know what's going on in the world. And with so many different news sources to choose from, it can be hard to know who to trust.
That's where this dataset comes in. It captures data related to individuals' Sentiment Analysis toward different news sources. The data was collected by administering a survey to individuals who use different news sources. The survey responses were then analyzed to obtain the sentiment score for each news source.
So if you're feeling overwhelmed by the news, don't worry – this dataset has you covered. With its insights on which news sources are trustworthy and which ones aren't, you'll be able to make informed decisions about what to read – and what to skip
The Twitter Sentiment Analysis dataset can be used to analyze the impact of social media on news consumption. This data can be used to study how individuals' sentiments towards different news sources vary based on the source they use. The dataset can also be used to study how different factors, such as the time of day or the topic of the news, affect an individual's sentiments
File: news.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: news_api.csv | Column name | Description | |:--------------|:------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Source | The news source the article is from. (String) |
File: politics.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: sports.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: television.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: trending.csv | Column name | Description ...
A survey held in the United States in 2023 revealed that 40 percent of responding adults said that what they disliked most about getting news on social media was the fact that the news is inaccurate, an increase of nine percent from 2018. Other reasons given were low-quality news or other people's behavior. Social media news consumption is complex With inaccurate news being the main reason consumers dislike news via social networks, the issue of trust also comes into play. Whilst fake and manipulated content can circulate on any platform, social media platforms can exacerbate the matter, with written posts, video footage, and audio easily shared and disseminated at the click of a button. TikTok in particular, with its focus on short-form snappy content, ranked poorly in terms of trusted social networks - 50 percent of U.S. adults responding to a survey considered the platform very untrustworthy. What are the positives of news found on social media? Data from 2023 showed that 20 percent of adults in the United States who used social media to get news stated that convenience was their main reason for doing so. Speed and interaction with people were the two next most popular reasons for using social networking platforms as a source of news. Even so, the majority (more than a third) of respondents said they did not know why they liked getting news on social networks or did not answer. This speaks to the complex relationship the public now has with social media – its convenience, as well as its prevalence in users’ everyday lives, means that it can often be difficult to avoid using it. However, when it comes to news, users remain unsure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Exploring how young people engage with, share, and are influenced by news has long captivated academic interest. It is crucial for comprehending how young people are informed and develop critical thinking skills amid evolving media landscapes, and for predicting potential impacts on the industry and democracy. Given the increasing complexity of the news field, this paper conducts a systematic literature review from 2010 to 2022, focusing on journals within SCImago’s top 100 list for journalism, media, and communication. The review categorises the 232 academic papers based on origin, methods, and types of youth studied. First, this article systematises geographical origin, methods used, ages and types of youth studied in the 232 academic papers comprising the final sample. Second, it summarises key findings concerning how the most cited papers frame “youth” and “news”. Last, the article concludes by pointing out research gaps and possible future challenges. The study reveals that user studies are prominent, while production studies on news media reaching young people are scarce. There is a strong Western bias in current research, with a prevalence of U.S. college student survey studies. The terms “youth” and “news” lack in-depth exploration. This article discusses challenges arising from these findings.
According to data gathered in a survey held in 2022, 17 percent of responding U.S. adults said that they got their news from social media on a regular basis, down from 19 percent in the previous year and 23 percent 2020. After a growth of people who claimed to never get news from social media from 21 percent in 2020 to 24 percent in 2021, this share dropped back to 21 percent in 2022.
CNN News dataset to access structured information about CNN articles, including headlines, authors, topics, publication dates, and multimedia elements like videos and images. Popular use cases include analyzing journalistic trends, tracking content dissemination, and studying the evolution of news topics over time.
The CNN News dataset offers a comprehensive collection of metadata and content attributes for articles published by CNN, making it a valuable resource for understanding modern journalism and media trends. Each entry includes essential fields such as article ID, URL, authorship details, headline, assigned topics for categorization, publication date, and an updated timestamp indicating the most recent modifications. The content field provides the full textual body of the article, complemented by embedded videos and images that enhance the multimedia storytelling experience.
The dataset also links related articles, offering additional context or perspectives on related topics, and includes keywords that highlight the primary themes and subjects of each piece. Ideal for researchers, media analysts, and journalism professionals, this dataset supports studies on news dissemination, audience engagement, and the dynamics of digital reporting. By leveraging the CNN News dataset, users can explore the evolution of news content, analyze media practices, and uncover trends shaping the digital news ecosystem.
Below is a list of the different columns in the dataset along with a brief description of each: - id: Unique identifier for each article - url: Web address of the article - author: Writer or contributor of the article - headline: Title of the news article - topics: Subject categories or themes - publication_date: When the article was first published - updated_last: Last modification date - content: Main body text of the article - videos: Video content associated with the article - images: Visual media included in the article - related_articles: Links to connected stories - keyword: Key terms for categorization
This dataset is valuable for: - Content Analysis: Studying news reporting patterns and editorial focus - Media Research: Analyzing CNN's coverage and reporting style - NLP Applications: Training models for news classification and content analysis - Multimedia Analysis: Studying the integration of text, images, and videos in digital news
CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement
~Up to $0.0025 per record. Min order $250
Approximately 295K new records are added each month. Approximately 726K records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.
New snapshot each month, 12 snapshots/year Paid monthly
New snapshot each quarter, 4 snapshots/year Paid quarterly
New snapshot every 6 months, 2 snapshots/year Paid twice-a-year
New snapshot one-time delivery Paid once
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes. Due to these restrictions, the collection is not open data. Please fill out the form and upload the Data Sharing Agreement at Google Form.
Citation
Please cite our work as
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Subtask 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Output data format
Sample File
public_id, predicted_rating
1, false
2, true
Sample file
public_id, predicted_domain
1, health
2, crime
Additional data for Training
To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible sources:
IMPORTANT!
Evaluation Metrics
This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Submission Link: Coming soon
Related Work
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
TikTok is developing into a key platform for news, advertising, politics, online shopping, and entertainment in Germany, with over 20 million monthly users. Especially among young people, TikTok plays an increasing role in their information environment. We provide a human-coded dataset of over 4,000 TikTok videos from German-speaking news outlets from 2023. The coding includes descriptive variables of the videos (e.g., visual style, text overlays, and audio presence) and theory-derived concepts from the journalism sciences (e.g., news values).
This dataset consists of every second video published in 2023 by major news outlets active on TikTok from Germany, Austria, and Switzerland. The data collection was facilitated with the official TikTok API in January 2024. The manual coding took place between September 2024 and December 2024. For a detailed description of the data collection, validation, annotation and descriptive analysis, please refer to [Forthcoming dataset paper publication].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0
By downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
ID- Unique identifier of the news article
Title- Title of the news article
text- Text mentioned inside the news article
our rating - class of the news article as false, partially false, true, other
Output data format
public_id- Unique identifier of the news article
predicted_rating- predicted class
Sample File
public_id, predicted_rating 1, false 2, true
IMPORTANT!
We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Social media is a vast pool of content, and among all the content available for users to access, news is an element that is accessed most frequently. These news can be posted by politicians, news channels, newspaper websites, or even common civilians. These posts have to be checked for their authenticity, since spreading misinformation has been a real concern in today’s times, and many firms are taking steps to make the common people aware of the consequences of spread misinformation. The measure of authenticity of the news posted online cannot be definitively measured, since the manual classification of news is tedious and time-consuming, and is also subject to bias. Published paper: http://www.ijirset.com/upload/2020/june/115_4_Source.PDF
Data preprocessing has been done on the dataset Getting Real about Fake News and skew has been eliminated.
In an era where fake WhatsApp forwards and Tweets are capable of influencing naive minds, tools and knowledge have to be put to practical use in not only mitigating the spread of misinformation but also to inform people about the type of news they consume. Development of practical applications for users to gain insight from the articles they consume, fact-checking websites, built-in plugins and article parsers can further be refined, made easier to access, and more importantly, should create more awareness.
Getting Real about Fake News seemed the most promising for preprocessing, feature extraction, and model classification. The reason is due to the fact that all the other datasets lacked the sources from where the article/statement text was produced and published from. Citing the sources for article text is crucial to check the trustworthiness of the news and further helps in labelling the data as fake or untrustworthy.
Thanks to the dataset’s comprehensiveness in terms of citing the source information of the text along with author names, date of publication and labels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.Introduction:Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as asHow far does the information in the form of news reach out to the public?Does the content of news remain the same or changes to a certain extent?Do the cultural values impact the information especially when the same news will get translated in other languages?Statistics about datasets:
Statistics about datasets:
--------------------------------------------------------------------------------------------------------------------------------------
# Domain Event Type Articles Per Language Total Articles
1 Sports FIFA World Cup 983-en, 762-sp, 711-de, 10-sl, 216-pt 2679
2 Natural Disaster Earthquake 941-en, 999-sp, 937-de, 19-sl, 251-pt 3194
3 Climate Changes Global Warming 996-en, 298-sp, 545-de, 8-sl, 97-pt 1945
--------------------------------------------------------------------------------------------------------------------------------------
https://www.icpsr.umich.edu/web/ICPSR/studies/5518/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/5518/terms
This study consists of three data files -- Channel, Central, and Peripheral -- used in the United Nations Institute for Training and Research (UNITAR) project concerned with the relations between the United Nations (UN) and the news media in 50 nations in 1968. In particular, the study deals with the role of the news media in spreading information on the UN and with coverage of UN policies and activities by the press, radio, and television in these nations. The Channel File (Part 1) contains data for a total of 2,080 news organs for the press, radio, and television. Variables describe the characteristics of the news organs, such as type, place, frequency, size, language, political affiliation, estimated average circulation, and location of publication, as well as sources and issues covered. The Central File (Part 2) provides data for 13,228 news reports containing any discrete pieces of information issued during a scheduled period of observation by the UN, a UN-connected, or a UN-based outlet included in the survey. Variables describe source, languages, types of media report, size of report, UN organs mentioned, and content characteristics. The Peripheral File (Part 3) contains data for 91,195 news reports containing discrete pieces of information referring in some way to the UN system or its components or to UN affairs and events carried by organs of the press, radio, and television during a scheduled period of observation. Variables in this file describe the type, place, medium, date, frequency, format, circulation, distribution, language, and political affiliation of publication, as well as type of programming, broadcast, and network, and duration of broadcast, references to the UN, and UN organs mentioned.
Governments may have the capacity to flood social media with fake news, but little is known about the use of flooding by ordinary voters. In this work, we identify 2107 registered US voters that account for 80% of the fake news shared on Twitter during the 2020 US presidential election by an entire panel of 664,391 voters. We find that supersharers are important members of the network, reaching a sizable 5.2% of registered voters on the platform. Supersharers have a significant overrepresentation of women, older adults, and registered Republicans. Supersharers' massive volume does not seem automated but is rather generated through manual and persistent retweeting. These findings highlight a vulnerability of social media for democracy, where a small group of people distort the political reality for many., This dataset contains aggregated information necessary to replicate the results reported in our work on Supersharers of Fake News on Twitter while respecting and preserving the privacy expectations of individuals included in the analysis. No individual-level data is provided as part of this dataset. The data collection process that enabled the creation of this dataset leveraged a large-scale panel of registered U.S. voters matched to Twitter accounts. We examined the activity of 664,391 panel members who were active on Twitter during the months of the 2020 U.S. presidential election (August to November 2020, inclusive), and identified a subset of 2,107 supersharers, which are the most prolific sharers of fake news in the panel that together account for 80% of fake news content shared on the platform. We rely on a source-level definition of fake news, that uses the manually-labeled list of fake news sites by Grinberg et al. 2019 and an updated list based on NewsGuard ratings (commercial..., , # Supersharers of Fake News on Twitter
This repository contains data and code for replication of the results presented in the paper.
The folders are mostly organized by research questions as detailed below. Each folder contains the code and publicly available data necessary for the replication of results. Importantly, no individual-level data is provided as part of this repository. De-identified individual-level data can be attained for IRB-approved uses under the terms and conditions specified in the paper. Once access is granted, the restricted-access data is expected to be located under ./restricted_data
.
The folders in this repository are the following:
Code under the preprocessing
folder contains the following:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Many people consume news on social media, yet the production of news items online has come under crossfire due to the common spreading of misinformation. Social media platforms police their content in various ways. Primarily they rely on crowdsourced “flags”: users signal to the platform that a specific news item might be misleading and, if they raise enough of them, the item will be fact-checked. However, real-world data show that the most flagged news sources are also the most popular and – supposedly – reliable ones. In this paper, we show this phenomenon can be explained by the unreasonable assumptions current content policing strategies make about how the online social media environment is shaped. The most realistic assumption is that confirmation bias will prevent a user from flagging a news item if they share the same political bias as the news source producing it. We show, via agent-based simulations, that a model reproducing our current understanding of the social media environment will necessarily result in the most neutral and accurate sources receiving most flags.
Data from a survey held in August 2022 in the United States revealed that the most popular news source among millennials was social media, with 45 percent of respondents reporting daily news consumption on social networks. This was more than double the share who got their news via radio. When it comes to trust, though, social media does not fare well.
Social media and news consumption
As adults of all ages spend more and more time on social media, news consumption via this avenue is likely to increase, but something which could affect this trend is the lack of trust in the news consumers encounter on social platforms. Although now the preferred option for younger audiences, social networks are among the least trusted news sources in the United States, and concerns about fake news remain prevalent.
Young audiences and fake news
Inaccurate news is a major problem which worsened during the 2016 and 2020 presidential election campaigns and the COVID-19 pandemic. A global study found that most Gen Z and Millennial news consumers ignored fake coronavirus news on social media, but almost 20 percent interacted with such posts in the comments section, and over seven percent shared the content. Younger news consumers in the United States were also the most likely to report feeling overwhelmed by COVID-19 news. As younger audiences were the most likely to get their updates on the outbreak via social media, this also made them the most susceptible to fake news, and younger generations are also the most prone to ‘doomscrolling’, an addictive act where the reader pursues and digests multiple negative or upsetting news articles in one sitting.
During a 2024 survey, 77 percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just 23 percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis. Social media: trust and consumption Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than 35 percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than 50 percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media. What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis. Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers. Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As news organizations face considerable challenges, many journalists are utilizing alternative platforms and storytelling techniques to reach citizens. Of note, several newsrooms now disseminate news through the video-based social media app, TikTok, using strategies and aesthetics common on the platform (i.e., humor, sketches, trending sounds, etc.). However, the extent to which audiences trust and learn from this news content is unclear. In this study, we conduct a randomized, controlled online experiment (N = 538) to assess two main questions: 1) Do audiences trust and find credible journalists who disseminate news through TikTok? and 2) To what extent do audiences learn from a TikTok news video, especially in comparison to a print article? The results indicate audiences may find journalists less trustworthy, credible, professional, and knowledgeable (but more likable) when they convey information over TikTok, rather than through a print article. Further, participants who viewed a news TikTok demonstrated higher topic knowledge than those who read an article containing the same information, and this relationship was mediated by attention. These results suggest that journalists can successfully convey news information over TikTok; however, this may come at the expense of reduced credibility perceptions.
Following the 2016 US presidential election, many have expressed concern about the effects of false stories ("fake news"), circulated largely through social media. We discuss the economics of fake news and present new data on its consumption prior to the election. Drawing on web browsing data, archives of fact-checking websites, and results from a new online survey, we find: 1) social media was an important but not dominant source of election news, with 14 percent of Americans calling social media their "most important" source; 2) of the known false news stories that appeared in the three months before the election, those favoring Trump were shared a total of 30 million times on Facebook, while those favoring Clinton were shared 8 million times; 3) the average American adult saw on the order of one or perhaps several fake news stories in the months around the election, with just over half of those who recalled seeing them believing them; and 4) people are much more likely to believe stories that favor their preferred candidate, especially if they have ideologically segregated social media networks.
Abstract: Black Lives Matter, Occupy Wall Street, and the Tea Party are among the many movements that have reignited media attention to protest activity. Yet, there is much to learn about what this media coverage conveys. In particular, how much does who is protesting matter for how the media portray protesters and their objectives? In this paper, we draw on an extensive content analysis of cable and broadcast news media coverage of protest activities to demonstrate substantial differences in how protests are covered, depending on the race and objective of the protesters. We find that media are much more likely to depict protests by people of color using language that evokes a sense of threat by using anger- and fear-laden language than comparable coverage of protest activity involving mostly White individuals. Our results demonstrate that racial biases in news coverage are much broader than previously thought. In doing so, our work highlights the powerful role that a protester’s race plays in whether media will condone or challenge their political voice.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data presented in this data project were collected in the context of two H2020 research projects: ‘Enhanced migration measures from a multidimensional perspective’(HumMingBird) and ‘Crises as opportunities: Towards a level telling field on migration and a new narrative of successful integration’(OPPORTUNITIES). The current survey was fielded to investigate the dynamic interplay between media representations of different migrant groups and the governmental and societal (re)actions to immigration. With these data, we provide more insight into these societal reactions by investigating attitudes rooted in values and worldviews. Through an online survey, we collected quantitative data on attitudes towards: Immigrants, Refugees, Muslims, Hispanics, Venezuelans News Media Consumption Trust in News Media and Societal Institutions Frequency and Valence of Intergroup Contact Realistic and Symbolic Intergroup Threat Right-wing Authoritarianism Social Dominance Orientation Political Efficacy Personality Characteristics Perceived COVID-threat, and Socio-demographic Characteristics For the adult population aged 25 to 65 in seven European countries: Austria Belgium Germany Hungary Italy Spain Sweden And for ages ranged from 18 to 65 for: United States of America Colombia The survey in the United States and Colombia was identical to the one in the European countries, although a few extra questions regarding COVID-19 and some region-specific migrant groups (e.g. Venezuelans) were added. We collected the data in cooperation with Bilendi, a Belgian polling agency, and selected the methodology for its cost-effectiveness in cross-country research. Respondents received an e-mail asking them to participate in a survey without specifying the subject matter, which was essential to avoid priming. Three weeks of fieldwork in May and June of 2021 resulted in a dataset of 13,645 respondents (a little over 1500 per country). Sample weights are included in the dataset and can be applied to ensure that the sample is representative for gender and age in each country. The cooperation rate ranged between 12% and 31%, in line with similar online data collections.
Do you ever feel like you're being inundated with news from all sides, and you can't keep up? Well, you're not alone. In today's age of social media and 24-hour news cycles, it can be difficult to know what's going on in the world. And with so many different news sources to choose from, it can be hard to know who to trust.
That's where this dataset comes in. It captures data related to individuals' Sentiment Analysis toward different news sources. The data was collected by administering a survey to individuals who use different news sources. The survey responses were then analyzed to obtain the sentiment score for each news source.
So if you're feeling overwhelmed by the news, don't worry – this dataset has you covered. With its insights on which news sources are trustworthy and which ones aren't, you'll be able to make informed decisions about what to read – and what to skip
The Twitter Sentiment Analysis dataset can be used to analyze the impact of social media on news consumption. This data can be used to study how individuals' sentiments towards different news sources vary based on the source they use. The dataset can also be used to study how different factors, such as the time of day or the topic of the news, affect an individual's sentiments
File: news.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: news_api.csv | Column name | Description | |:--------------|:------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Source | The news source the article is from. (String) |
File: politics.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: sports.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: television.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |
File: trending.csv | Column name | Description ...