26 datasets found
  1. d

    Social Media Grievance: Year- and Month-wise Number of Reports Received and...

    • dataful.in
    Updated Apr 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataful (Factly) (2025). Social Media Grievance: Year- and Month-wise Number of Reports Received and Action Taken by Twitter [Dataset]. https://dataful.in/datasets/18629
    Explore at:
    xlsx, csv, application/x-parquetAvailable download formats
    Dataset updated
    Apr 4, 2025
    Dataset authored and provided by
    Dataful (Factly)
    License

    https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

    Area covered
    India
    Variables measured
    Twitter Grievances
    Description

    High Frequency Indicator: The dataset contains year- and month-wise compiled data from the year 2021 to till date on the number of different types of grievances (complaints) received from the users by Twitter and the action taken by it. The data compiled is based on the monthly transparency reports published by Twitter in accordance with Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021 (IT Rules, 2021).

    The types of grievances received by Twitter include illegal activities, IP-related infringements and other issues such as Abuse,Harassment, Child Sexual Exploitation, Defamation, Hateful Conduct, Impersonation, Misinformation, etc. The action taken by Twitter on the basis of these reports includes the number of URLs actioned

  2. Social Media Users 2021

    • kaggle.com
    Updated Feb 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Margaretha Martinez (2021). Social Media Users 2021 [Dataset]. https://www.kaggle.com/datasets/margarethamartinez/socialmedia2021
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Margaretha Martinez
    Description
  3. g

    Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

    • search.gesis.org
    Updated Oct 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pfeffer, Jürgen (2022). Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2516
    Explore at:
    Dataset updated
    Oct 16, 2022
    Dataset provided by
    GESIS, Köln
    GESIS search
    Authors
    Pfeffer, Jürgen
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Description

    At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.

  4. Instagram accounts with the most followers worldwide 2024

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

                  The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
    
                  How popular is Instagram?
    
                  Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
    
                  Who uses Instagram?
    
                  Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
    
                  Celebrity influencers on Instagram
                  Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
    
  5. f

    Twitter bot profiling

    • figshare.com
    • researchdata.smu.edu.sg
    • +1more
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Living Analytics Research Centre (2023). Twitter bot profiling [Dataset]. http://doi.org/10.25440/smu.12062706.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    Living Analytics Research Centre
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    This dataset comprises a set of Twitter accounts in Singapore that are used for social bot profiling research conducted by the Living Analytics Research Centre (LARC) at Singapore Management University (SMU). Here a bot is defined as a Twitter account that generates contents and/or interacts with other users automatically (at least according to human judgment). In this research, Twitter bots have been categorized into three major types:

    Broadcast bot. This bot aims at disseminating information to general audience by providing, e.g., benign links to news, blogs or sites. Such bot is often managed by an organization or a group of people (e.g., bloggers). Consumption bot. The main purpose of this bot is to aggregate contents from various sources and/or provide update services (e.g., horoscope reading, weather update) for personal consumption or use. Spam bot. This type of bots posts malicious contents (e.g., to trick people by hijacking certain account or redirecting them to malicious sites), or promotes harmless but invalid/irrelevant contents aggressively.

    This categorization is general enough to cater for new, emerging types of bot (e.g., chatbots can be viewed as a special type of broadcast bots). The dataset was collected from 1 January to 30 April 2014 via the Twitter REST and streaming APIs. Starting from popular seed users (i.e., users having many followers), their follow, retweet, and user mention links were crawled. The data collection proceeds by adding those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. Using this procedure, a total of 159,724 accounts have been collected. To identify bots, the first step is to check active accounts who tweeted at least 15 times within the month of April 2014. These accounts were then manually checked and labelled, of which 589 bots were found. As many more human users are expected in the Twitter population, the remaining accounts were randomly sampled and manually checked. With this, 1,024 human accounts were identified. In total, this results in 1,613 labelled accounts. Related Publication: R. J. Oentaryo, A. Murdopo, P. K. Prasetyo, and E.-P. Lim. (2016). On profiling bots in social media. Proceedings of the International Conference on Social Informatics (SocInfo’16), 92-109. Bellevue, WA. https://doi.org/10.1007/978-3-319-47880-7_6

  6. Instagram: distribution of global audiences 2024, by age group

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age group [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.

                  Instagram users
    
                  With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
    
                  Instagram features
    
                  One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
                  As of the second quarter of 2021, Snapchat had 293 million daily active users.
    
  7. Monthly Samples of German Tweets

    • zenodo.org
    zip
    Updated Mar 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nane Kratzke; Nane Kratzke (2023). Monthly Samples of German Tweets [Dataset]. http://doi.org/10.5281/zenodo.3383622
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 8, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nane Kratzke; Nane Kratzke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains German tweets and Twitter accounts recorded from the public Twitter Streaming API using the following filters:

    • terms: 'a', 'e', 'i', 'o', 'u', and 'n'
    • language: 'de'

    This filter combination should record (almost) all German tweets (in German it is very unlikely that terms do not contain vowels or the frequently used character 'n').

    This dataset might be useful for the following use cases:

    • Natural language processing (focussing on Twitter specifics in German, there exist only little German datasets)
    • Social Network Analysis (Twitter network)
    • Identifying behavioural patterns (retweeting, quoting, replying, hate speech, ...)
    • Sharing political (or other domain-specific) content
    • Bot detection
    • and more ...

    This dataset will be updated monthly. Each sample (starting in April 2019) will follow the following naming pattern:

    • german-tweet-sample--.zip (size: ~ 1GB)

    It will contain several bunches of recorded JSON gzipped files. Each bunch of records contains approximately 50k recorded tweets/accounts (size: ~ 6MB).

  8. d

    Year- and Month-wise Number of General Queries received about User Accounts...

    • dataful.in
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataful (Factly) (2025). Year- and Month-wise Number of General Queries received about User Accounts by Twitter [Dataset]. https://dataful.in/datasets/18657
    Explore at:
    xlsx, application/x-parquet, csvAvailable download formats
    Dataset updated
    Apr 4, 2025
    Dataset authored and provided by
    Dataful (Factly)
    License

    https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

    Time period covered
    2021 - 2024
    Area covered
    India
    Variables measured
    General information requests
    Description

    High Frequency Indicator: The dataset contains year- and month-wise compiled data from the year 2021 to till date on number general queries received byTwitter about its user accounts.

    The data compiled is based on the monthly transparency reports published by Koo in accordance with Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021 (IT Rules, 2021)

  9. World - Twitter Sentiment By Country

    • kaggle.com
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Jiang (2020). World - Twitter Sentiment By Country [Dataset]. https://www.kaggle.com/wjia26/twittersentimentbycountry/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    William Jiang
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    World
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1041505%2F0625876b77e55a56422bb5a37d881e0d%2Fawdasdw.jpg?generation=1595666545033847&alt=media" alt="">

    Introduction

    Ever wondered what people are saying about certain countries? Whether it's in a positive/negative light? What are the most commonly used phrases/words to describe the country? In this dataset I present tweets where a certain country gets mentioned in the hashtags (e.g. #HongKong, #NewZealand). It contains around 150 countries in the world. I've added an additional field called polarity which has the sentiment computed from the text field. Feel free to explore! Feedback is much appreciated!

    Content

    Each row represents a tweet. Creation Dates of Tweets Range from 12/07/2020 to 25/07/2020. Will update on a Monthly cadence. - The Country can be derived from the file_name field. (this field is very Tableau friendly when it comes to plotting maps) - The Date at which the tweet was created can be got from created_at field. - The Search Query used to query the Twitter Search Engine can be got from search_query field. - The Tweet Full Text can be got from the text field. - The Sentiment can be got from polarity field. (I've used the Vader Model from NLTK to compute this.)

    Notes

    There maybe slight duplications in tweet id's before 22/07/2020. I have since fixed this bug.

    Acknowledgements

    Thanks to the tweepy package for making the data extraction via Twitter API so easy.

    Shameless Plug

    Feel free to checkout my blog if you want to learn how I built the datalake via AWS or for other data shenanigans.

    Here's an App I built using a live version of this data.

  10. Z

    #IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azmi Nawwar (2024). #IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4362504
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset authored and provided by
    Azmi Nawwar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset ini merupakan hasil dari scraping pada media sosial twitter dengan menggunakan aplikasi twint yang ditujukan pada hashtag #IndonesiaHumanRightsSOS. Scraping data dilakukan untuk cuitan yang dibuat dari tanggal 18 Desember 2020 10:59 AM s/d 19 Desember 2020 23:18 PM.

    Pada dataset mengandung 106.903 Row data dengan informasi terkait: User ID, Username, Twitter Name,Tweets, dsb.

    Selain itu dilampirkan juga contoh data yang telah dianalisis berupa wordcloud,username cloud, 100 most used word & most active username.

    -

    This dataset is the result of scraping on social media twitter using the twint application aimed at the hashtag #IndonesiaHumanRightsSOS. Data scraping is done for tweets made from December 18 2020 10:59 AM to December 19 2020 23:18 PM.

    The dataset contains 106,903 rows of data with related information: User ID, Username, Twitter Name, Tweets, etc.

    Also there is an example of the data that has been analyzed in the form of wordcloud, username cloud, 100 most used words & most active username.

  11. Instagram: distribution of global audiences 2024, by gender

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.

                  Instagram’s Global Audience
    
                  As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
                  As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
    
                  Who is winning over the generations?
    
                  Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
    
  12. d

    Proactive Action of Social Media Companies: Year- and Month-wise Number of...

    • dataful.in
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataful (Factly) (2025). Proactive Action of Social Media Companies: Year- and Month-wise Number of Suspended, Banned and Deleted Accounts, Chatrooms and Livestreams by SSMIs [Dataset]. https://dataful.in/datasets/18652
    Explore at:
    csv, application/x-parquet, xlsxAvailable download formats
    Dataset updated
    Apr 4, 2025
    Dataset authored and provided by
    Dataful (Factly)
    License

    https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

    Area covered
    India
    Variables measured
    Social Media Intermediaries Ban actions
    Description

    High Frequency Indicator: The dataset contains year- and month-wise compiled data from the year 2021 to till date on the number of social media accounts which have been deleted and suspended and the chatrooms, edit profiles, comments and livestreams banned, together with reasons such as child sexual abuse, terrorism, etc., by significant social media intermediaries (SSMIs) such as Sharechat, Snapchat, Twitter and WhatsApp. The data compiled is based on the monthly transparency reports published by SSMIs in accordance with Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021 (IT Rules, 2021)

    Notes:

    1. SSMI denotes to social media intermediary, with over 50,00,000 registered users in India, which primarily or solely enables online interaction between two or more users and allows them to create, upload, share, disseminate, modify or access information using its services

    2. ShareChat enforces various bans spread across different time periods such as 360 days, 30 days, 7 days, 3 days, 1 day, hourly bans, together with other bans such as UGC, Edit Profile, Comment, Livstream, Chatroom bans, etc.

  13. Instagram: distribution of global audiences 2024, by age and gender

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.

                  Teens and social media
    
                  As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
                  Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
    
  14. Facebook: distribution of global audiences 2024, by age and gender

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

                  Facebook connects the world
    
                  Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
                  as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.
    
  15. Twitter vs. Newsletter Impact

    • kaggle.com
    Updated Sep 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael Tatman (2017). Twitter vs. Newsletter Impact [Dataset]. https://www.kaggle.com/rtatman/twitter-vs-newsletter/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rachael Tatman
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context:

    There are lots of really cool datasets getting added to Kaggle every day, and as part of my job I want to help people find them. I’ve been tweeting about datasets on my personal Twitter accounts @rctatman and also releasing a weekly newsletter of interesting datasets.

    I wanted to know which method was more effective at getting the word out about new datasets: Twitter or the newsletter?

    Content:

    This dataset contains two .csv files. One has information on the impact of tweets with links to datasets, while the other has information on the impact of the newsletter.

    Twitter:

    The Twitter .csv has the following information:

    • month: The month of the tweet (1-12)
    • day: The day of the tweet (1-31)
    • hour: The hour of the tweet (1-24)
    • impressions: The number of impressions the tweet got
    • engagement: The number of total engagements
    • clicks: The number of URL clicks

    Fridata Newsletter:

    The Fridata .csv has the following information:

    • date: The Date the newsletter was sent out
    • month: The Month the newsletter was sent out (1-12)
    • day: The day the newsletter was sent out (1-31)
    • # of dataset links: How many links were in the newsletter
    • recipients: How many people received the email with the newsletter
    • total opens: How many times the newsletter was opened
    • unique opens: How many individuals opened the newsletter
    • total clicks: The total number of clicks on the newsletter
    • unique clicks: (unsure; provided by Tinyletter)
    • notes: notes on the newsletter

    Acknowledgements:

    This dataset was collected by the uploader, Rachael Tatman. It is released here under a CC-BY-SA license.

    Inspiration:

    • Which format receives more views?
    • Which format receives more clicks?
    • Which receives more clicks/view?
    • What’s the best time of day to send a tweet?
  16. CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis

    • zenodo.org
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li (2025). CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis [Dataset]. http://doi.org/10.5281/zenodo.11409612
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 1, 2024
    Description

    Overview
    The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.

    Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.

    Data Collection and Annotation
    Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
    • Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
    • Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
    • Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
    • The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]

    Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
    a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
    b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
    After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.

    Dataset Description
    • Total Samples: 61,734
    • Total Samples Annotated: 57,222 after filtering.
    • Total Posts: 3,646
    • Average Likes per Post: 65.1
    • Average Likes per Comment: 10.5
    • Average Length of News Text: 655 words
    • Average Number of Images per Post: 3.7

    Components of the Dataset
    The dataset comprises two main components:
    CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
    Images Folder: Contains folders with images corresponding to each post.

    Data Format and Fields of the CSV File
    The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
    Id: Unique identifier
    Post: The heading of the news article.
    News_text: The text of the news article.
    News_link: URL link to the original news article.
    News_Images: A path to the folder containing images related to the post.
    Post_shares: Number of times the post has been shared.
    Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
    Comment: Text of the user comment.
    Comment_like: Number of likes on the comment.
    Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
    Comment_link: URL link to the original comment on Facebook.
    Comment_rank: Rank of the comment based on engagement and relevance.
    Score: Sentiment score computed based on the consensus of sentiment analysis models.
    Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
    Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).

    More Considerations During Dataset Construction
    We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:

    • Why not merge data from different social media platforms?
    We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.

    • Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.

    • Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.

    • Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.

    Ethical considerations, data privacy and misuse prevention
    The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"

  17. Instagram: most popular posts as of 2024

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: most popular posts as of 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Instagram’s most popular post

                  As of April 2024, the most popular post on Instagram was Lionel Messi and his teammates after winning the 2022 FIFA World Cup with Argentina, posted by the account @leomessi. Messi's post, which racked up over 61 million likes within a day, knocked off the reigning post, which was 'Photo of an Egg'. Originally posted in January 2021, 'Photo of an Egg' surpassed the world’s most popular Instagram post at that time, which was a photo by Kylie Jenner’s daughter totaling 18 million likes.
                  After several cryptic posts published by the account, World Record Egg revealed itself to be a part of a mental health campaign aimed at the pressures of social media use.
    
                  Instagram’s most popular accounts
    
                  As of April 2024, the official Instagram account @instagram had the most followers of any account on the platform, with 672 million followers. Portuguese footballer Cristiano Ronaldo (@cristiano) was the most followed individual with 628 million followers, while Selena Gomez (@selenagomez) was the most followed woman on the platform with 429 million. Additionally, Inter Miami CF striker Lionel Messi (@leomessi) had a total of 502 million. Celebrities such as The Rock, Kylie Jenner, and Ariana Grande all had over 380 million followers each.
    
                  Instagram influencers
    
                  In the United States, the leading content category of Instagram influencers was lifestyle, with 15.25 percent of influencers creating lifestyle content in 2021. Music ranked in second place with 10.96 percent, followed by family with 8.24 percent. Having a large audience can be very lucrative: Instagram influencers in the United States, Canada and the United Kingdom with over 90,000 followers made around 1,221 US dollars per post.
    
                  Instagram around the globe
    
                  Instagram’s worldwide popularity continues to grow, and India is the leading country in terms of number of users, with over 362.9 million users as of January 2024. The United States had 169.65 million Instagram users and Brazil had 134.6 million users. The social media platform was also very popular in Indonesia and Turkey, with 100.9 and 57.1, respectively. As of January 2024, Instagram was the fourth most popular social network in the world, behind Facebook, YouTube and WhatsApp.
    
  18. Z

    Impact of social media on suicide rates: produced results

    • data.niaid.nih.gov
    Updated Apr 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Winkler (2021). Impact of social media on suicide rates: produced results [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4701392
    Explore at:
    Dataset updated
    Apr 20, 2021
    Dataset authored and provided by
    Martin Winkler
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Data produced in Impact-of-social-media-on-suicide-rates .

    Acknowledgements:

    WHO data export (https://apps.who.int/gho/athena/api/GHO/SDGSUICIDE.csv) was used bound by the following poilcy: https://www.who.int/about/who-we-are/publishing-policies/data-policy

    Kaggle dataset regarding social media usage was used (https://www.kaggle.com/margarethamartinez/socialmedia2021) with additional acknowledgements of the original sources necessary:

    Acknowledgements:

    Facebook: https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/

    Twitter: https://investor.twitterinc.com/home/default.aspx

    Instagram: https://investor.fb.com/home/default.aspx

  19. o

    Social Media Engagement Sentiment

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Social Media Engagement Sentiment [Dataset]. https://www.opendatabay.com/data/ai-ml/840edf8a-202c-42ce-815a-45c7cbc1c364
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    This dataset captures a vibrant tapestry of emotions, trends, and interactions across various social media platforms. It provides a snapshot of user-generated content, encompassing text, timestamps, hashtags, countries, likes, and retweets. Each entry unveils unique stories—moments of surprise, excitement, admiration, thrill, and contentment—shared by individuals worldwide. It is designed to offer insights into social media dynamics and user sentiments.

    Columns

    • Unnamed: 0: An index column, typically for record identification within the dataset.
    • Text: User-generated content, presenting the original social media post from which sentiments are derived.
    • Sentiment: The categorised emotion or sentiment expressed in the 'Text' column, such as Positive, Negative, or Neutral.
    • Timestamp: The exact date and time when the social media content was posted.
    • User: A unique identifier for the individual user who contributed the content.
    • Platform: The specific social media platform where the content originated (e.g., Twitter, Instagram, Facebook).
    • Hashtags: Keywords or phrases prefixed with '#' that identify trending topics and themes within the content.
    • Retweets: A numerical value reflecting the popularity of the content, indicating how many times it was shared or re-posted.
    • Likes: A numerical value quantifying user engagement, representing the number of 'likes' the post received.
    • Country: The geographical origin of the social media post.
    • Year: The year the social media post was published.
    • Month: The month the social media post was published.
    • Day: The day of the month the social media post was published.
    • Hour: The hour of the day the social media post was published.

    Distribution

    The dataset is typically provided as a data file, most often in CSV format. A sample file will be updated separately to the platform. The structure is tabular, organised into the columns described above. Specific numbers for rows or records are not available, but it represents a daily snapshot of social media activity.

    Usage

    This dataset is a rich source of information that can be leveraged for various analytical purposes: * Sentiment Analysis: Explore the emotional landscape by conducting sentiment analysis on the 'Text' column. Classify user-generated content into categories such as surprise, excitement, admiration, thrill, and contentment. * Temporal Analysis: Investigate trends over time using the 'Timestamp' column. Identify patterns, fluctuations, or recurring themes in social media content. * User Behaviour Insights: Analyse user engagement through the 'Likes' and 'Retweets' columns to discover popular content and user preferences. * Platform-Specific Analysis: Examine variations in content across different social media platforms using the 'Platform' column. Understand how sentiments vary across platforms. * Hashtag Trends: Identify trending topics and themes by analysing the 'Hashtags' column. Uncover popular or recurring hashtags. * Geographical Analysis: Explore content distribution based on the 'Country' column. Understand regional variations in sentiment and topic preferences. * User Identification: Utilise the 'User' column to track specific users and their contributions. Analyse the impact of influential users on sentiment trends. * Cross-Analysis: Combine multiple features for in-depth insights. For example, analyse sentiment trends over time or across different platforms and countries.

    Coverage

    The dataset offers global geographical coverage, capturing posts from individuals worldwide. Examples in the sample data include content originating from the USA, Canada, the UK, Australia, and India. The data represents a snapshot of user-generated content, with the provided sample covering a few days in January 2023. The demographic scope is tied to general social media users, with no specific demographic breakdowns noted.

    License

    CCO

    Who Can Use It

    This dataset is ideal for: * Data scientists and machine learning engineers looking to train and validate models for sentiment analysis, natural language processing, and social media analytics. * Researchers and academics studying social trends, public opinion, digital communication, and user engagement patterns. * Marketing and brand analysts seeking to understand consumer sentiment, track brand mentions, and evaluate the reception of campaigns across different social platforms. * Anyone interested in gaining insights into the emotional landscape and dynamic interactions occurring within social media environments.

    Dataset Name Suggestions

    • Social Media Sentiments Analysis Dataset
    • Global Social Sentiment Tracker
    • User-Generated Content Emotion Data
    • Digital Conversation Sentiment Snapshot
    • Social Media Engagement Sentiment

    Attributes

    Original

  20. Average daily time spent on social media worldwide 2012-2025

    • statista.com
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dataful (Factly) (2025). Social Media Grievance: Year- and Month-wise Number of Reports Received and Action Taken by Twitter [Dataset]. https://dataful.in/datasets/18629

Social Media Grievance: Year- and Month-wise Number of Reports Received and Action Taken by Twitter

Explore at:
xlsx, csv, application/x-parquetAvailable download formats
Dataset updated
Apr 4, 2025
Dataset authored and provided by
Dataful (Factly)
License

https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

Area covered
India
Variables measured
Twitter Grievances
Description

High Frequency Indicator: The dataset contains year- and month-wise compiled data from the year 2021 to till date on the number of different types of grievances (complaints) received from the users by Twitter and the action taken by it. The data compiled is based on the monthly transparency reports published by Twitter in accordance with Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021 (IT Rules, 2021).

The types of grievances received by Twitter include illegal activities, IP-related infringements and other issues such as Abuse,Harassment, Child Sexual Exploitation, Defamation, Hateful Conduct, Impersonation, Misinformation, etc. The action taken by Twitter on the basis of these reports includes the number of URLs actioned

Search
Clear search
Close search
Google apps
Main menu