9 datasets found
  1. Top Youtube News Media Statistics

    • kaggle.com
    zip
    Updated Jul 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    crxxom (2023). Top Youtube News Media Statistics [Dataset]. https://www.kaggle.com/datasets/crxxom/top-youtube-news-media-statistics/code
    Explore at:
    zip(901734 bytes)Available download formats
    Dataset updated
    Jul 14, 2023
    Authors
    crxxom
    Area covered
    YouTube
    Description

    The dataset contains detailed information on some of the most popular English media channels on Youtube. From channel overview to statistics of the top 50 videos of each channel, here is a description of all the columns of the two datasets.

    Mainstream Media Statistics

    1. channelName: name of the channel on Youtube
    2. id: The channel ID in Youtube
    3. subscribers: subscriber count (up till 14/7/2023)
    4. total views: total views of all the videos of the channel (up till 14/7/2023)
    5. total videos: total number of videos of the channel (up till 14/7/2023)
    6. created date: The date where the channel is created
    7. description: description of the channel in their description page
    8. playlistId: The id of the channel's video list

    Top50_viewed_video_from_each_channels

    1. Video Id: The ID of the video on Youtube
    2. Channel Title: The channel name of the video
    3. Title: Title of the video
    4. publishedAt: When the video is published
    5. categoryId: The category ID of Youtube (You may reference at https://mixedanalytics.com/blog/list-of-youtube-video-category-ids/)
    6. description: The description of the video
    7. viewCount: The total number of views of that video (up till 14/7/2023)
    8. likeCount: The total number of likes of that video (up till 14/7/2023)
    9. commentCount: The total number of comments of that video (up till 14/7/2023)
    10. duration: The duration of that video

    Inspirations

    Data is scraped using Youtube API, feel free to use the data as long as it copes with the term of uses of Youtube. Something you can do with the dataset may be to analysis what news are of people's interest or to watch some of the most viewed news in the world to stay close with the society.

  2. Social Media Political Content Analysis Dataset

    • kaggle.com
    zip
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Hameed (2024). Social Media Political Content Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/fysalhameed/impact-of-social-media-on-political-consent
    Explore at:
    zip(355107 bytes)Available download formats
    Dataset updated
    May 13, 2024
    Authors
    Faisal Hameed
    Description

    This dataset contains simulated data for social media users' demographics, behaviors, and perceptions related to political content. It includes features such as age, gender, education level, occupation, social media usage frequency, exposure to political content, and perceptions of accuracy and relevance.

    the features included in the "Social Media Political Content Analysis Dataset":

    1. Age: Age of the user.
    2. Gender: Gender identity of the user.
    3. Education Level: Highest level of education attained by the user.
    4. Occupation: Current occupation of the user.
    5. Political Affiliation: Political leaning or affiliation of the user (e.g., Liberal, Conservative, Independent).
    6. Geographic Location: Country or region where the user is located (e.g., USA, UK, Canada, Australia).
    7. Social Media Usage Frequency: Frequency of social media usage by the user (e.g., 0-1 hour, 1-2 hours, 2-4 hours, 4+ hours).
    8. Preferred Social Media: Social media platform preferred by the user (e.g., Facebook, Twitter, Instagram).
    9. Political Content Exposure: Frequency of exposure to political content on social media (e.g., Once a day, Few times a week, Rarely, Several times a day).
    10. Types of Political Content: Types of political content consumed by the user (e.g., News articles, Opinion pieces, Memes).
    11. Sources of Political Content: Sources from which the user obtains political content (e.g., Mainstream media, Political parties, Independent bloggers).
    12. Recency of Exposure: Recency of the user's exposure to political content (e.g., Within the last hour, Within the last 24 hours, Within the last week, Longer than a week ago).
    13. Interactions Frequency: Frequency of user interactions with political content on social media (e.g., Once a day, Few times a week, Rarely, Several times a day).
    14. Political Content Topics: Topics of political content that interest the user (e.g., Economy, Healthcare, Immigration, Environment).
    15. Perception of Accuracy: User's perception of the accuracy of political content on social media (e.g., Very accurate, Somewhat accurate, Not accurate).
    16. Awareness of Algorithms: Whether the user is aware of algorithms that determine their social media feed (e.g., Yes, No).
    17. Perception of Relevance: User's perception of the relevance of political content on social media (e.g., Very relevant, Somewhat relevant, Not relevant).
    18. Personal Impact: User's perception of the personal impact of political content on social media (e.g., Strong impact, Moderate impact, No impact).
    19. Trust in Social Media: User's level of trust in social media as a source of political information (e.g., Trust a lot, Trust somewhat, Do not trust).
    20. Concerns about Algorithms: User's level of concern about algorithms shaping their social media experience (e.g., Very concerned, Somewhat concerned, Not concerned).
    21. Overall Quality of Discourse: User's perception of the overall quality of political discourse on social media (e.g., High quality, Moderate quality, Low quality).
    22. Views on Influence: User's perception of the influence of political content on social media (e.g., Very influential, Somewhat influential, Not influential).
    23. Suggestions for Improvement: User's suggestions for improving the quality or experience of political content on social media (e.g., Increase transparency, Provide more diverse sources, Improve fact-checking, Enhance user controls).
  3. facebook fact checking dataset

    • figshare.com
    csv
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mehdi khalil (2024). facebook fact checking dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27645690.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    mehdi khalil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OverviewThe BuzzFeed dataset, officially known as the BuzzFeed-Webis Fake News Corpus 2016, comprises content from 9 news publishers over a 7-day period close to the 2016 US election. It was created to analyze the spread of misinformation and hyperpartisan content on social media platforms, particularly Facebook.Dataset CompositionNews Articles: The dataset includes 1,627 articles from various sources:826 from mainstream publishers256 from left-wing publishers545 from right-wing publishersFacebook Posts: Each article is associated with Facebook post data, including metrics like share counts, reaction counts, and comment counts.Comments: The dataset includes nearly 1.7 million Facebook comments discussing the news content.Fact-Check Ratings: Each article was fact-checked by professional journalists at BuzzFeed, providing veracity assessments.Key FeaturesPublisher Information: The dataset covers 9 publishers, including 6 hyperpartisan (3 left-wing and 3 right-wing) and 3 mainstream outlets.Temporal Aspect: The data was collected over seven weekdays (September 19-23 and September 26-27, 2016).Verification Status: All publishers included in the dataset had earned Facebook's blue checkmark, indicating authenticity and elevated status.Metadata: Includes various metrics such as publication dates, post types, and engagement statistics.Potential ApplicationsThe BuzzFeed dataset is valuable for various research and analytical purposes:News Veracity Assessment: Researchers can use machine learning techniques to classify articles based on their factual accuracy.Social Media Analysis: The dataset allows for studying how news spreads on platforms like Facebook, including engagement patterns.Hyperpartisan Content Study: It enables analysis of differences between mainstream and hyperpartisan news sources.Content Strategy Optimization: Media companies can use insights from the dataset to refine their content strategies.Audience Analysis: The data can be used for demographic analysis and audience segmentation.This dataset provides a comprehensive snapshot of news dissemination and engagement on social media during a crucial period, making it a valuable resource for researchers, data scientists, and media analysts studying online information ecosystems.

  4. MicroBlog-Hot-Search-Labeled

    • kaggle.com
    zip
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ChaneMo (2024). MicroBlog-Hot-Search-Labeled [Dataset]. https://www.kaggle.com/datasets/chanemo/weibo-hot-searchlabeled
    Explore at:
    zip(4152220 bytes)Available download formats
    Dataset updated
    May 19, 2024
    Authors
    ChaneMo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Weibo is one of the mainstream social media platforms in China. Among its features, trending topics serve as an important real-time information source for Weibo users, consisting of the most popular search terms at the moment. Weibo's official platform does not provide corresponding tag information for these trending topics, making it difficult for users to access specific categories of topics. To address this issue, we collected over 6,000 trending topic data entries from November 24th to December 23rd, 2020. Each entry was manually categorized into one of eight major categories: "(时政)Politics", "(科技)Technology", "(科普)Popular Science", "(娱乐)Entertainment", "(体育)Sports", "(社会讨论/话题)Social Discussions/Topics", "(时事)Current Affairs" and "(经济)Economy". This categorization aims to facilitate subsequent applications. Besides, we provide another dataset of hot search that are unlabeled. - Politics: The current political news happening now. - Technology: News related to high-tech products. - Popular Science: News topics about popularizing knowledge. - Entertainment: News related to celebrities or variety shows. - Sports: News related to sports events or sports celebrities. - Social Discussions/Topics: Hot topics being discussed by the general public. - Current Affairs: Current social events happening now. - Economy: News related to the economy.

  5. f

    Data from: Measuring Influence of Users in Twitter Ecosystems Using a...

    • tandf.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donggeng Xia; Shawn Mankad; George Michailidis (2023). Measuring Influence of Users in Twitter Ecosystems Using a Counting Process Modeling Framework [Dataset]. http://doi.org/10.6084/m9.figshare.2068272
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Donggeng Xia; Shawn Mankad; George Michailidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data extracted from social media platforms are both large in scale and complex in nature, since they contain both unstructured text, as well as structured data, such as time stamps and interactions between users. A key question for such platforms is to determine influential users, in the sense that they generate interactions between members of the platform. Common measures used both in the academic literature and by companies that provide analytics services are variants of the popular web-search PageRank algorithm applied to networks that capture connections between users. In this work, we develop a modeling framework using multivariate interacting counting processes to capture the detailed actions that users undertake on such platforms, namely posting original content, reposting and/or mentioning other users’ postings. Based on the proposed model, we also derive a novel influence measure. We discuss estimation of the model parameters through maximum likelihood and establish their asymptotic properties. The proposed model and the accompanying influence measure are illustrated on a dataset covering a five-year period of the Twitter actions of the members of the U.S. Senate, as well as mainstream news organizations and media personalities. Supplementary material is available online including computer code, data, and derivation details.

  6. Trust in media and main source of news by gender and province

    • www150.statcan.gc.ca
    • open.canada.ca
    • +1more
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2024). Trust in media and main source of news by gender and province [Dataset]. http://doi.org/10.25318/4510010201-eng
    Explore at:
    Dataset updated
    May 16, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Percentage of persons aged 15 years and over by trust in media and main source of news, by gender, for Canada, regions and provinces.

  7. f

    Data_Sheet_1_True, justified, belief? Partisanship weakens the positive...

    • figshare.com
    • frontiersin.figshare.com
    docx
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Jeffrey Sude; Gil Sharon; Shira Dvir-Gvirsman (2023). Data_Sheet_1_True, justified, belief? Partisanship weakens the positive effect of news media literacy on fake news detection.docx [Dataset]. http://doi.org/10.3389/fpsyg.2023.1242865.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Frontiers
    Authors
    Daniel Jeffrey Sude; Gil Sharon; Shira Dvir-Gvirsman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To investigate how people assess whether politically consistent news is real or fake, two studies (N = 1,008; N = 1,397) with adult American participants conducted in 2020 and 2022 utilized a within-subjects experimental design to investigate perceptions of news accuracy. When a mock Facebook post with either fake (Study 1) or real (Study 2) news content was attributed to an alternative (vs. a mainstream) news outlet, it was, on average, perceived to be less accurate. Those with beliefs reflecting News Media Literacy demonstrated greater sensitivity to the outlet’s status. This relationship was itself contingent on the strength of the participant’s partisan identity. Strong partisans high in News Media Literacy defended the accuracy of politically consistent content, even while recognizing that an outlet was unfamiliar. These results highlight the fundamental importance of looking at the interaction between user-traits and features of social media news posts when examining learning from political news on social media.

  8. d

    SYRI Facebook discussions about COVID-19 pandemic and the Russo-Ukrainian...

    • demo-b2find.dkrz.de
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). SYRI Facebook discussions about COVID-19 pandemic and the Russo-Ukrainian war 2023 - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/1f0673f4-3afc-5c34-bf96-207484d49b8a
    Explore at:
    Dataset updated
    Sep 20, 2025
    Area covered
    Ukraine
    Description

    During times of crisis, fear may prompt a greater need for identity confirmation to reduce the uncertainty. People find comfort in identifying with an ingroup online, but this could worsen societal division. Our study analyzed Facebook discussions about COVID-19 pandemic and the Russo-Ukrainian war in order to identify the common patterns of economic and social uncertainties expressed through repeated narratives. We focused on the public Facebook pages of two Czech mainstream TV news outlets during two phases of each crisis and analyzed 1,680 comments with grounded theory’s coding procedures. The findings indicate that polarizing narratives resembling populist discourse are used to construct the identity of “the people” standing against “the elites”. We contribute to studies on social media radicalization by revealing its non-partisan character, as well as by showing that it occurs outside the fringe online spaces, in the online media mainstream. Data cannot be archived or shared as it contains personal information and, due to its nature, cannot be anonymised.

  9. f

    Data_Sheet_1_Lumen: A machine learning framework to expose influence cues in...

    • frontiersin.figshare.com
    pdf
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanyu Shi; Mirela Silva; Luiz Giovanini; Daniel Capecci; Lauren Czech; Juliana Fernandes; Daniela Oliveira (2023). Data_Sheet_1_Lumen: A machine learning framework to expose influence cues in texts.PDF [Dataset]. http://doi.org/10.3389/fcomp.2022.929515.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Frontiers
    Authors
    Hanyu Shi; Mirela Silva; Luiz Giovanini; Daniel Capecci; Lauren Czech; Juliana Fernandes; Daniela Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Phishing and disinformation are popular social engineering attacks with attackers invariably applying influence cues in texts to make them more appealing to users. We introduce Lumen, a learning-based framework that exposes influence cues in text: (i) persuasion, (ii) framing, (iii) emotion, (iv) objectivity/subjectivity, (v) guilt/blame, and (vi) use of emphasis. Lumen was trained with a newly developed dataset of 3K texts comprised of disinformation, phishing, hyperpartisan news, and mainstream news. Evaluation of Lumen in comparison to other learning models showed that Lumen and LSTM presented the best F1-micro score, but Lumen yielded better interpretability. Our results highlight the promise of ML to expose influence cues in text, toward the goal of application in automatic labeling tools to improve the accuracy of human-based detection and reduce the likelihood of users falling for deceptive online content.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
crxxom (2023). Top Youtube News Media Statistics [Dataset]. https://www.kaggle.com/datasets/crxxom/top-youtube-news-media-statistics/code
Organization logo

Top Youtube News Media Statistics

Contains detailed statistics of the top 43 english media channels in youtube

Explore at:
zip(901734 bytes)Available download formats
Dataset updated
Jul 14, 2023
Authors
crxxom
Area covered
YouTube
Description

The dataset contains detailed information on some of the most popular English media channels on Youtube. From channel overview to statistics of the top 50 videos of each channel, here is a description of all the columns of the two datasets.

Mainstream Media Statistics

  1. channelName: name of the channel on Youtube
  2. id: The channel ID in Youtube
  3. subscribers: subscriber count (up till 14/7/2023)
  4. total views: total views of all the videos of the channel (up till 14/7/2023)
  5. total videos: total number of videos of the channel (up till 14/7/2023)
  6. created date: The date where the channel is created
  7. description: description of the channel in their description page
  8. playlistId: The id of the channel's video list

Top50_viewed_video_from_each_channels

  1. Video Id: The ID of the video on Youtube
  2. Channel Title: The channel name of the video
  3. Title: Title of the video
  4. publishedAt: When the video is published
  5. categoryId: The category ID of Youtube (You may reference at https://mixedanalytics.com/blog/list-of-youtube-video-category-ids/)
  6. description: The description of the video
  7. viewCount: The total number of views of that video (up till 14/7/2023)
  8. likeCount: The total number of likes of that video (up till 14/7/2023)
  9. commentCount: The total number of comments of that video (up till 14/7/2023)
  10. duration: The duration of that video

Inspirations

Data is scraped using Youtube API, feel free to use the data as long as it copes with the term of uses of Youtube. Something you can do with the dataset may be to analysis what news are of people's interest or to watch some of the most viewed news in the world to stay close with the society.

Search
Clear search
Close search
Google apps
Main menu