Facebook
TwitterThe dataset contains detailed information on some of the most popular English media channels on Youtube. From channel overview to statistics of the top 50 videos of each channel, here is a description of all the columns of the two datasets.
Mainstream Media Statistics
Top50_viewed_video_from_each_channels
Inspirations
Data is scraped using Youtube API, feel free to use the data as long as it copes with the term of uses of Youtube. Something you can do with the dataset may be to analysis what news are of people's interest or to watch some of the most viewed news in the world to stay close with the society.
Facebook
TwitterThis dataset contains simulated data for social media users' demographics, behaviors, and perceptions related to political content. It includes features such as age, gender, education level, occupation, social media usage frequency, exposure to political content, and perceptions of accuracy and relevance.
the features included in the "Social Media Political Content Analysis Dataset":
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewThe BuzzFeed dataset, officially known as the BuzzFeed-Webis Fake News Corpus 2016, comprises content from 9 news publishers over a 7-day period close to the 2016 US election. It was created to analyze the spread of misinformation and hyperpartisan content on social media platforms, particularly Facebook.Dataset CompositionNews Articles: The dataset includes 1,627 articles from various sources:826 from mainstream publishers256 from left-wing publishers545 from right-wing publishersFacebook Posts: Each article is associated with Facebook post data, including metrics like share counts, reaction counts, and comment counts.Comments: The dataset includes nearly 1.7 million Facebook comments discussing the news content.Fact-Check Ratings: Each article was fact-checked by professional journalists at BuzzFeed, providing veracity assessments.Key FeaturesPublisher Information: The dataset covers 9 publishers, including 6 hyperpartisan (3 left-wing and 3 right-wing) and 3 mainstream outlets.Temporal Aspect: The data was collected over seven weekdays (September 19-23 and September 26-27, 2016).Verification Status: All publishers included in the dataset had earned Facebook's blue checkmark, indicating authenticity and elevated status.Metadata: Includes various metrics such as publication dates, post types, and engagement statistics.Potential ApplicationsThe BuzzFeed dataset is valuable for various research and analytical purposes:News Veracity Assessment: Researchers can use machine learning techniques to classify articles based on their factual accuracy.Social Media Analysis: The dataset allows for studying how news spreads on platforms like Facebook, including engagement patterns.Hyperpartisan Content Study: It enables analysis of differences between mainstream and hyperpartisan news sources.Content Strategy Optimization: Media companies can use insights from the dataset to refine their content strategies.Audience Analysis: The data can be used for demographic analysis and audience segmentation.This dataset provides a comprehensive snapshot of news dissemination and engagement on social media during a crucial period, making it a valuable resource for researchers, data scientists, and media analysts studying online information ecosystems.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Weibo is one of the mainstream social media platforms in China. Among its features, trending topics serve as an important real-time information source for Weibo users, consisting of the most popular search terms at the moment. Weibo's official platform does not provide corresponding tag information for these trending topics, making it difficult for users to access specific categories of topics. To address this issue, we collected over 6,000 trending topic data entries from November 24th to December 23rd, 2020. Each entry was manually categorized into one of eight major categories: "(时政)Politics", "(科技)Technology", "(科普)Popular Science", "(娱乐)Entertainment", "(体育)Sports", "(社会讨论/话题)Social Discussions/Topics", "(时事)Current Affairs" and "(经济)Economy". This categorization aims to facilitate subsequent applications. Besides, we provide another dataset of hot search that are unlabeled. - Politics: The current political news happening now. - Technology: News related to high-tech products. - Popular Science: News topics about popularizing knowledge. - Entertainment: News related to celebrities or variety shows. - Sports: News related to sports events or sports celebrities. - Social Discussions/Topics: Hot topics being discussed by the general public. - Current Affairs: Current social events happening now. - Economy: News related to the economy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data extracted from social media platforms are both large in scale and complex in nature, since they contain both unstructured text, as well as structured data, such as time stamps and interactions between users. A key question for such platforms is to determine influential users, in the sense that they generate interactions between members of the platform. Common measures used both in the academic literature and by companies that provide analytics services are variants of the popular web-search PageRank algorithm applied to networks that capture connections between users. In this work, we develop a modeling framework using multivariate interacting counting processes to capture the detailed actions that users undertake on such platforms, namely posting original content, reposting and/or mentioning other users’ postings. Based on the proposed model, we also derive a novel influence measure. We discuss estimation of the model parameters through maximum likelihood and establish their asymptotic properties. The proposed model and the accompanying influence measure are illustrated on a dataset covering a five-year period of the Twitter actions of the members of the U.S. Senate, as well as mainstream news organizations and media personalities. Supplementary material is available online including computer code, data, and derivation details.
Facebook
TwitterPercentage of persons aged 15 years and over by trust in media and main source of news, by gender, for Canada, regions and provinces.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To investigate how people assess whether politically consistent news is real or fake, two studies (N = 1,008; N = 1,397) with adult American participants conducted in 2020 and 2022 utilized a within-subjects experimental design to investigate perceptions of news accuracy. When a mock Facebook post with either fake (Study 1) or real (Study 2) news content was attributed to an alternative (vs. a mainstream) news outlet, it was, on average, perceived to be less accurate. Those with beliefs reflecting News Media Literacy demonstrated greater sensitivity to the outlet’s status. This relationship was itself contingent on the strength of the participant’s partisan identity. Strong partisans high in News Media Literacy defended the accuracy of politically consistent content, even while recognizing that an outlet was unfamiliar. These results highlight the fundamental importance of looking at the interaction between user-traits and features of social media news posts when examining learning from political news on social media.
Facebook
TwitterDuring times of crisis, fear may prompt a greater need for identity confirmation to reduce the uncertainty. People find comfort in identifying with an ingroup online, but this could worsen societal division. Our study analyzed Facebook discussions about COVID-19 pandemic and the Russo-Ukrainian war in order to identify the common patterns of economic and social uncertainties expressed through repeated narratives. We focused on the public Facebook pages of two Czech mainstream TV news outlets during two phases of each crisis and analyzed 1,680 comments with grounded theory’s coding procedures. The findings indicate that polarizing narratives resembling populist discourse are used to construct the identity of “the people” standing against “the elites”. We contribute to studies on social media radicalization by revealing its non-partisan character, as well as by showing that it occurs outside the fringe online spaces, in the online media mainstream. Data cannot be archived or shared as it contains personal information and, due to its nature, cannot be anonymised.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Phishing and disinformation are popular social engineering attacks with attackers invariably applying influence cues in texts to make them more appealing to users. We introduce Lumen, a learning-based framework that exposes influence cues in text: (i) persuasion, (ii) framing, (iii) emotion, (iv) objectivity/subjectivity, (v) guilt/blame, and (vi) use of emphasis. Lumen was trained with a newly developed dataset of 3K texts comprised of disinformation, phishing, hyperpartisan news, and mainstream news. Evaluation of Lumen in comparison to other learning models showed that Lumen and LSTM presented the best F1-micro score, but Lumen yielded better interpretability. Our results highlight the promise of ML to expose influence cues in text, toward the goal of application in automatic labeling tools to improve the accuracy of human-based detection and reduce the likelihood of users falling for deceptive online content.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe dataset contains detailed information on some of the most popular English media channels on Youtube. From channel overview to statistics of the top 50 videos of each channel, here is a description of all the columns of the two datasets.
Mainstream Media Statistics
Top50_viewed_video_from_each_channels
Inspirations
Data is scraped using Youtube API, feel free to use the data as long as it copes with the term of uses of Youtube. Something you can do with the dataset may be to analysis what news are of people's interest or to watch some of the most viewed news in the world to stay close with the society.