29 datasets found
  1. H

    Tweets Dataset - Top 20 most followed users in Twitter social platform

    • dataverse.harvard.edu
    Updated Aug 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raad Bin Tareaf (2017). Tweets Dataset - Top 20 most followed users in Twitter social platform [Dataset]. http://doi.org/10.7910/DVN/JBXKFD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 18, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Raad Bin Tareaf
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    -This Dataset was gathered by crawling Twitter's REST API using the Python library tweepy 3. This dataset contains the tweets of the 20 most popular twitter users (with the most followers) whereby retweets are neglected. These accounts belong to public people, such as Katy Perry and Barack Obama, platforms, YouTube, Instagram, and television channels shows, e.g., CNN Breaking News and The Ellen Show. -Consequently, the dataset contains a mix of relatively structured tweets, tweets written in a formal and informative manner, and completely unstructured tweets written in a colloquial style. Unfortunately, the geocoordinates were not available for those tweets. - H -This Dataset has been used to generate reserach paper under title "Machine Learning Techniques for Anomalies Detection in Post Arrays". -Crawled attributes are: Author (Twitter User), Content (Tweet), Date_Time, id (Twitter User ID), language (Tweet Langugage), Number_of_Likes, Number_of_Shares. Overall: 52543 tweets of top 20 users in twitter Screen_Name #Tweets Time span (in days) TheEllenShow 3,147 - 662 jimmyfallon 3,123 - 1231 ArianaGrande 3,104 - 613 YouTube 3,077 - 411 KimKardashian 2,939 - 603 katyperry 2,924 - 1,598 selenagomez 2,913 - 2,266 rihanna 2,877 - 1,557 BarackObama 2,863 - 849 britneyspears 2,776 - 1,548 instagram 2,577 - 456 shakira 2,530 - 1,850 Cristiano 2,507 - 2,407 jtimberlake 2,478 - 2,491 ladygaga 2,329 - 894 Twitter 2,290 - 2,593 ddlovato 2,217 - 741 taylorswift13 2,029 - 2,091 justinbieber 2,000 - 664 cnnbrk 1,842 - 183

  2. Top 100 social media profiles

    • kaggle.com
    Updated Aug 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medaxone (2022). Top 100 social media profiles [Dataset]. https://www.kaggle.com/medaxone/top-100-social-media-profiles/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Medaxone
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A list of the most popular (top 100 by followers) Instagram, Twitter, YouTube, Twitch, and TikTok users. NB! For YouTube the followers are subscribers and the posts are videos.

  3. Instagram: most popular posts as of 2024

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: most popular posts as of 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Instagram’s most popular post

                  As of April 2024, the most popular post on Instagram was Lionel Messi and his teammates after winning the 2022 FIFA World Cup with Argentina, posted by the account @leomessi. Messi's post, which racked up over 61 million likes within a day, knocked off the reigning post, which was 'Photo of an Egg'. Originally posted in January 2021, 'Photo of an Egg' surpassed the world’s most popular Instagram post at that time, which was a photo by Kylie Jenner’s daughter totaling 18 million likes.
                  After several cryptic posts published by the account, World Record Egg revealed itself to be a part of a mental health campaign aimed at the pressures of social media use.
    
                  Instagram’s most popular accounts
    
                  As of April 2024, the official Instagram account @instagram had the most followers of any account on the platform, with 672 million followers. Portuguese footballer Cristiano Ronaldo (@cristiano) was the most followed individual with 628 million followers, while Selena Gomez (@selenagomez) was the most followed woman on the platform with 429 million. Additionally, Inter Miami CF striker Lionel Messi (@leomessi) had a total of 502 million. Celebrities such as The Rock, Kylie Jenner, and Ariana Grande all had over 380 million followers each.
    
                  Instagram influencers
    
                  In the United States, the leading content category of Instagram influencers was lifestyle, with 15.25 percent of influencers creating lifestyle content in 2021. Music ranked in second place with 10.96 percent, followed by family with 8.24 percent. Having a large audience can be very lucrative: Instagram influencers in the United States, Canada and the United Kingdom with over 90,000 followers made around 1,221 US dollars per post.
    
                  Instagram around the globe
    
                  Instagram’s worldwide popularity continues to grow, and India is the leading country in terms of number of users, with over 362.9 million users as of January 2024. The United States had 169.65 million Instagram users and Brazil had 134.6 million users. The social media platform was also very popular in Indonesia and Turkey, with 100.9 and 57.1, respectively. As of January 2024, Instagram was the fourth most popular social network in the world, behind Facebook, YouTube and WhatsApp.
    
  4. H

    Youtube title datasets: drawn from Brexit related tweets from pro and anti...

    • dataverse.harvard.edu
    Updated Aug 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gareth Lynch (2022). Youtube title datasets: drawn from Brexit related tweets from pro and anti Brexit accounts January - March 2022 (Brexit leaning based on their Twitter bios) [Dataset]. http://doi.org/10.7910/DVN/ID6ZOX
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Gareth Lynch
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    These datasets were collated as part of a Master's dissertation project. Each dataset includes the video titles from Youtube links shared by pro and anti Brexit Twitter users (as discerned using Twitter bio keywords). The datasets that these links are drawn from are also available, and are linked to this dataset.

  5. h

    Tweets_Dataset

    • huggingface.co
    Updated Jan 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hay.Bnz (2017). Tweets_Dataset [Dataset]. https://huggingface.co/datasets/haydenbanz/Tweets_Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2017
    Authors
    Hay.Bnz
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Twitter User Dataset

    This dataset was obtained by crawling Twitter's REST API using the Python library Tweepy 3. The dataset comprises tweets from the 20 most popular Twitter users based on the number of followers, with retweets excluded. These accounts include public figures such as Katy Perry and Barack Obama, platforms like YouTube and Instagram, and television channels such as CNN Breaking News and The Ellen Show. The dataset presents a diverse collection of tweets, ranging from… See the full description on the dataset page: https://huggingface.co/datasets/haydenbanz/Tweets_Dataset.

  6. f

    SNA Is is conspiracy or truth.xlsx

    • figshare.com
    • figshare.manchester.ac.uk
    xlsx
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beatriz Buarque (2022). SNA Is is conspiracy or truth.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.14115515.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 28, 2022
    Dataset provided by
    University of Manchester
    Authors
    Beatriz Buarque
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set has:- Comments manually collected from a YouTube video containing the 5G conspiracy theory articulated as legiitmate truth - Number of followers and followed Twitter users found on posts that shared the aforementioned video- Number of posts identified on Facebook sharing the same video and their respective number of followers

  7. ABOME: A Multi-platform Data Repository of Artificially Boosted Online Media...

    • zenodo.org
    Updated Jan 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty; Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty (2021). ABOME: A Multi-platform Data Repository of Artificially Boosted Online Media Entities [Dataset]. http://doi.org/10.5281/zenodo.3609250
    Explore at:
    Dataset updated
    Jan 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty; Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty
    Description

    Motivation

    The rise of online media has enabled users to choose various unethical and artificial ways of gaining social growth to boost their credibility (number of followers/retweets/views/likes/subscriptions) within a short time period. In this work, we present ABOME, a novel data repository consisting of datasets collected from multiple platforms for the analysis of blackmarket-driven collusive activities, which are prevalent but often unnoticed in online media. ABOME contains data related to tweets and users on Twitter, YouTube videos, YouTube channels. We believe ABOME is a unique data repository that one can leverage to identify and analyze blackmarket based temporal fraudulent activities in online media as well as the network dynamics.

    License

    Creative Commons License.

    Description of the dataset

    - Historical Data

    We collected the metadata of each entity present in the historical data

    Twitter:

    We collected the following fields for retweets and followers on Twitter:

    user_details: A JSON object representing a Twitter user.

    tweet_details: A JSON object representing a tweet.

    tweet_retweets: A JSON list of tweet objects representing the most recent 100 retweets of a given tweet.

    1. https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object↩︎

    2. https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object↩︎

    YouTube:

    We collected the following fields for YouTube likes and comments:

    is_family_friendly: Whether the video is marked as family friendly or not.

    genre: Genre of the video.

    duration: Duration of the video in ISO 8601 format (duration type). This format is generally used when the duration denotes the amount of intervening time in a time interval.

    description: Description of the video.

    upload_date: Date that the video was uploaded.

    is_paid: Whether the video is paid or not.

    is_unlisted: The privacy status of the video, i.e., whether the video is unlisted or not. Here, the flag unlisted indicates that the video can only be accessed by people who have a direct link to it.

    statistics: A JSON object containing the number of dislikes, views and likes for the video.

    comments: A list of comments for the video. Each element in the list is a JSON object of the text (the comment text) and time (the time when the comment was posted).

    We collected the following fields for YouTube channels:

    channel_description: Description of the channel.

    hidden_subscriber_count: Total number of hidden subscribers of the channel.

    published_at: Time when the channel was created. The time is specified in ISO 8601 format (YYYY-MM-DDThh:mm:ss.sZ).

    video_count: Total number of videos uploaded to the channel.

    subscriber_count: Total number of subscribers of the channel.

    view_count: The number of times the channel has been viewed.

    kind: The API resource type (e.g., youtube#channel for YouTube channels).

    country: The country the channel is associated with.

    comment_count: Total number of comments the channel has received.

    etag: The ETag of the channel which is an HTTP header used for web browser cache validation.

    The historical data is stored in five directories named according to the type of data inside it. Each directory contains json files corresponding to the data described above.

    - Time-series Data

    We collect the following time-series data for retweets and followers on Twitter:

    user_timeline: This is a JSON list of tweet objects in the user’s timeline, which consists of the tweets posted, retweeted and quoted by the user. The file created at each time interval contains the new tweets posted by the user during each time interval.

    user_followers: This is a JSON file containing the user ids of all the followers of a user that were added or removed from the follower list during each time interval.

    user_followees: This is a JSON file consisting of the user ids of all the users followed by a user, i.e., the followees of a user, that were added or removed from the followee list during each time interval.

    tweet_details: This is a JSON object representing a given tweet, collected after every time interval.

    tweet_retweets: This is a JSON list of tweet objects representing the most recent 100 retweets of a given tweet, collected after every time interval.

    The time-series data is stored in directories named according to the timestamp of the collection time. Each directory contains sub-directories corresponding to the data described above.

    Data Anonymization

    The data is anonymized by removing all Personally Identifiable Information (PII) and generating pseud-IDs corresponding to the original IDs. A consistent mapping between the original and pseudo-IDs is maintained to maintain the integrity of the data.

  8. Instagram accounts with the most followers worldwide 2024

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

                  The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
    
                  How popular is Instagram?
    
                  Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
    
                  Who uses Instagram?
    
                  Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
    
                  Celebrity influencers on Instagram
                  Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
    
  9. Data from: What Tweets and YouTube comments have in common? Sentiment and...

    • zenodo.org
    bin, csv, doc
    Updated Apr 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis; Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis (2021). What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020. [Dataset]. http://doi.org/10.5281/zenodo.4618233
    Explore at:
    bin, doc, csvAvailable download formats
    Dataset updated
    Apr 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis; Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, YouTube
    Description

    The presidential elections in the United States on November 3rd 2020 caused extensive discussions on social media. A part of the content on US elections is organic, coming from users discussing their opinions on the candidates, political positions, or relevant content presented on television. Another significant part originates from organized campaigns, both official, including communication campaigns and dissemination, or unofficial, including astroturfing and targeting manipulation of the electorate.

    In this study, we obtain approximately 19.8M tweets from 4.5M users, based on prevalent hashtags related to the 2020 US election. From these, we mined 28.343 YouTube links tweeted and obtained likes, dislikes and comments of these videos. In this paper, we study the connection between the two social networks. We employ an array of techniques, including volume analysis, exploring the retweet graph, sentiment and graph analysis on the communities formed in YouTube and Twitter. Furthermore, we propose a method to combine the results of community detection on the two social networks and measure the differences between them.

    Particularly, we study the daily traffic per prevalent hashtags, plot the retweet graph from July to November 2020, highlight the two main entities (‘Biden’ and ‘Trump’) and show how the discussion around those entities grows in the period closer to the elections. Additionally, we perform a sentiment analysis of both the Twitter corpus and the YouTube comments in tweeted videos. We found that 35,2% o the users contained in our Twitter dataset express positive sentiment towards Trump and 28% express positive sentiment towards Biden; while 18% of the users in our YouTube dataset express positive sentiment towards Trump and 12% express positive sentiment towards Biden. Finally, we link the Twitter Retweet graph with the YouTube comment graph using tweeted video links. We measure their similarity and differences and show the interactions and the correlation between the largest communities on YouTube and Twitter.

  10. Social Media Profile Links by Name

    • openwebninja.com
    json
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenWeb Ninja (2025). Social Media Profile Links by Name [Dataset]. https://www.openwebninja.com/api/social-links-search
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Feb 2, 2025
    Dataset authored and provided by
    OpenWeb Ninja
    Area covered
    Worldwide
    Description

    This dataset provides comprehensive social media profile links discovered through real-time web search. It includes profiles from major social networks like Facebook, TikTok, Instagram, Twitter, LinkedIn, Youtube, Pinterest, Github and more. The data is gathered through intelligent search algorithms and pattern matching. Users can leverage this dataset for social media research, influencer discovery, social presence analysis, and social media marketing. The API enables efficient discovery of social profiles across multiple platforms. The dataset is delivered in a JSON format via REST API.

  11. MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

    • zenodo.org
    Updated May 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane [Dataset]. http://doi.org/10.5281/zenodo.15401479
    Explore at:
    Dataset updated
    May 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

    We present a Multiplatform Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 98,662 relevant social media data posts from Reddit, X, TikTok, and YouTube.
    In addition, all relevant posts are annotated on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes in a multi-modal approach that considers both textual and visual content (text, images, and videos), providing a rich labeled dataset for in-depth analysis.
    The dataset is also complemented by an Online Analytics Platform (https://hurricane.web.illinois.edu/) that not only allows users to view hurricane-related posts and articles, but also explores high-frequency keywords, user sentiment, and the locations where posts were made.
    To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated hurricane dataset. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster severity classification, event detections, public sentiment analysis, and bias identification.

    Usage Notice

    This dataset includes four annotation files:
    • reddit_anno_publish.csv
    • tiktok_anno_publish.csv
    • twitter_anno_publish.csv
    • youtube_anno_publish.csv
    Each file contains post IDs and corresponding annotations on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes.
    To protect user privacy, only post IDs are released. We recommend retrieving the full post content via the official APIs of each platform, in accordance with their respective terms of service.

    Humanitarian Classes

    Each post is annotated with seven binary humanitarian classes. For each class, the label is either:
    • True – the post contains this humanitarian information
    • False – the post does not contain this information
    These seven humanitarian classes include:
    • Casualty: The post reports people or animals who are killed, injured, or missing during the hurricane.
    • Evacuation: The post describes the evacuation, relocation, rescue, or displacement of individuals or animals due to the hurricane.
    • Damage: The post reports damage to infrastructure or public utilities caused by the hurricane.
    • Advice: The post provides advice, guidance, or suggestions related to hurricanes, including how to stay safe, protect property, or prepare for the disaster.
    • Request: Request for help, support, or resources due to the hurricane
    • Assistance: This includes both physical aid and emotional or psychological support provided by individuals, communities, or organizations.
    • Recovery: The post describes efforts or activities related to the recovery and rebuilding process after the hurricane.
    Note: A single post may be labeled as True for multiple humanitarian categories.

    Bias Classes

    Each post is annotated with five binary bias classes. For each class, the label is either:
    • True – the post contains this bias information
    • False – the post does not contain this information
    These five bias classes include:
    • Linguistic Bias: The post contains biased, inappropriate, or offensive language, with a focus on word choice, tone, or expression.
    • Political Bias: The post expresses political ideology, showing favor or disapproval toward specific political actors, parties, or policies.
    • Gender Bias: The post contains biased, stereotypical, or discriminatory language or viewpoints related to gender.
    • Hate Speech: The post contains language that expresses hatred, hostility, or dehumanization toward a specific group or individual, especially those belonging to minority or marginalized communities.
    • Racial Bias: The post contains biased, discriminatory, or stereotypical statements directed toward one or more racial or ethnic groups.
    Note: A single post may be labeled as True for multiple bias categories.

    Information Integrity Classes

    Each post is also annotated with a single information integrity class, represented by an integer:
    • -1 → False information (i.e., misinformation or disinformation)
    • 0 → Unverifiable information (unclear or lacking sufficient evidence)
    • 1 → True information (verifiable and accurate)

    Key Notes

    1. This dataset is also available at https://huggingface.co/datasets/YRC10/MASH.
    2. Version 1 is no longer available.
  12. Instagram: distribution of global audiences 2024, by gender

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.

                  Instagram’s Global Audience
    
                  As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
                  As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
    
                  Who is winning over the generations?
    
                  Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
    
  13. The 10 most popular hashtags in our dataset.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The 10 most popular hashtags in our dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 10 most popular hashtags in our dataset.

  14. OTT consumption profile - Unicauca dataset

    • kaggle.com
    Updated Apr 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Sebastián Rojas (2019). OTT consumption profile - Unicauca dataset [Dataset]. https://www.kaggle.com/jsrojas/ott-consumption-profile-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Juan Sebastián Rojas
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    Context

    Network monitoring and analysis of consumption behavior represents an important aspect for network operators allowing to obtain vital information about consumption trends in order to offer new data plans aimed at specific users and obtain an adequate perspective of the network. Over-the-top (OTT) media and communications services and applications are shifting the Internet consumption by increasing the traffic generation over the different available networks. OTT refers to applications that deliver audio, video, and other media over the Internet by leveraging the infrastructure deployed by network operators but without their involvement in the control or distribution of the content and are known by their large consumption of network resources.

    Content

    This dataset contains 1581 instances and 131 attributes on a single file. Each instance represents a user’s consumption profile which holds summarized information about the consumption behavior of the user related to the 29 OTT applications identified in the different IP flows captured in order to create the dataset

    The OTT applications that the users interacted with during the capture experiment and were stored on the dataset are: Amazon, Apple store, Apple Icloud, Apple Itunes, Deezer, Dropbox, EasyTaxi, Ebay, Facebook, Gmail, Google suite, Google Maps, Browsing (HTTP, HTTP_Connect, HTTP_Download, HTTP_Proxy), Instagram, LastFM, Microsoft One Drive (MS_One_Drive), Facebook Messenger (MSN), Netflix, Skype, Spotify, Teamspeak, Teamviewer, Twitch, Twitter, Waze, Whatsapp, Wikipedia, Yahoo and Youtube.

    Each application has 4 different types of attributes (quantity of generated flows, mean duration of the flows, average size of the packets exchanged on the flows and the mean bytes per second on the flows). These attributes summarizes the interaction that the user had with the respective OTT application in terms of consumption. Furthermore, the dataset contains the user’s IP address in network and decimal format which are used as user identifiers. Finally the User Group attribute represents the objective class (high consumption, medium consumption and low consumption) in which a user is classified considering his/her OTT consumption behavior. All of this information gives a total of 131 attributes.

    For further information you can read and please cite the following papers:

    Research Gate: https://www.researchgate.net/publication/326150046_Personalized_Service_Degradation_Policies_on_OTT_Applications_Based_on_the_Consumption_Behavior_of_Users

    Springer: https://link.springer.com/chapter/10.1007/978-3-319-95168-3_37

    Research Gate: https://www.researchgate.net/publication/335954240_Consumption_Behavior_Analysis_of_Over_The_Top_Services_Incremental_Learning_or_Traditional_Methods

    IEEExplore: https://ieeexplore.ieee.org/document/8845576

    Attribute Description

    The structure of the attributes and its definition is presented below:

    • Source.Decimal: This attribute holds the user’s IP address in decimal format and it is mainly used as a user identifier.

    • Source.IP: This attribute holds the user’s IP address in network format (e.g., 192.168.14.35) and as in the previous case its main function is to work as a user identifier.

    • Application-Name.Flows: This type of attributes hold the information about the quantity of IP flows that a user generated toward an OTT application. As was mentioned before each application has a group of 4 attributes that describe the interaction of the user with a specific OTT application (an example for this case would be Netflix.Flows or Facebook.Flows).

    • Application-Name.Flow.Duration.Mean: This type of attributes hold the information related to the mean duration (time) of the flows generated by the user towards a specific OTT application, measured in microseconds. Examples of how this attributes are stored in the dataset are: Amazon.Flow.Duration.Mean or Instagram.Flow.Duration.Mean.

    • Application-Name.AVG.Packet.Size: This type of attributes hold the average size of the IP packets that were exchanged in all the flows generated by the user towards a specific OTT application, measured in bytes. It is important to notice that this size is focused on the packet’s header only. Examples of how this attribute are presented on the dataset are: Google_Maps.AVG.Packet.Size or Spotify.AVG.Packet.Size.

    • Application-Name.Flow.Bytes.Per.Sec: This type of attributes hold the mean number of bytes per second that were exchanged in the flows generated by the user towards a specific OTT application. Examples of this kind of attributes in the dataset are: Deezer.Flow.Bytes.Per.Sec or Skype.Flow.Bytes.Per.Sec.

    • User.Group: This type of attribute represents the objective class of the dataset i.e., the different groups that the users are classified in according to their OTT consumption behavior...

  15. Twitter users in Africa 2020, by country

    • statista.com
    Updated Jan 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Twitter users in Africa 2020, by country [Dataset]. https://www.statista.com/topics/9922/social-media-in-africa/
    Explore at:
    Dataset updated
    Jan 10, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    Africa
    Description

    This statistic shows a ranking of the estimated number of Twitter users in 2020 in Africa, differentiated by country. The user numbers have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in more than 150 countries and regions worldwide. All input data are sourced from international institutions, national statistical offices, and trade associations. All data has been are processed to generate comparable datasets (see supplementary notes under details for more information).

  16. D

    Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions:...

    • darus.uni-stuttgart.de
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt (2024). Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions: Effect of Context Switching On Prospective Memory" [Dataset]. http://doi.org/10.18419/DARUS-3327
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    DaRUS
    Authors
    Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    Social media platforms use short, highly engaging videos to catch users’ attention. While the short-form video feeds popularized by TikTok are rapidly spreading to other platforms, we do not yet understand their impact on cognitive functions. We conducted a between-subjects experiment (𝑁 = 60) investigating the impact of engaging with TikTok, Twitter, and YouTube while performing a Prospective Memory task (i.e., executing a previously planned action). The study required participants to remember intentions over interruptions. We found that the TikTok condition significantly degraded the users’ performance in this task. As none of the other conditions (Twitter, YouTube, no activity) had a similar effect, our results indicate that the combination of short videos and rapid context-switching impairs intention recall and execution. We contribute a quantified understanding of the effect of social media feed format on Prospective Memory and outline consequences for media technology designers not to harm the users’ memory and wellbeing. Description of the Dataset Data frame: The ./data/rt.csv provides the data frame of reaction times. The ./data/acc.csv provides the data frame of reaction accuracy scores. The ./data/q.csv provides the data frame collected from questionnaires. The ./data/ddm.csv is the learned DDM features using ./appendix2_ddm_fitting.ipynb, which is then used in ./3.ddm_anova.ipynb. Figures: All figures appeared in the paper are placed in ./figures and can be reproduced using *_vis.ipynb files.

  17. m

    UI/UX user interaction dataset across popular digital platforms

    • data.mendeley.com
    Updated Nov 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Atikur Rahman (2024). UI/UX user interaction dataset across popular digital platforms [Dataset]. http://doi.org/10.17632/dxthxmnkhx.6
    Explore at:
    Dataset updated
    Nov 19, 2024
    Authors
    Md Atikur Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises 2,271 entries and provides insights into user interface (UI) and user experience (UX) preferences across various digital platforms. Key information includes user demographics (Name, Age, Gender) and platform preferences (e.g., Twitter, YouTube, Facebook, Website). It captures user experiences and satisfaction levels with various UI/UX elements such as color schemes, visual hierarchy, typography, multimedia usage, and layout design. The dataset also includes evaluations of mobile responsiveness, call-to-action buttons, form usability, feedback/error messages, loading speed, personalization, accessibility, and interactions (like scrolling behavior and gestures). Each UI/UX component is rated on a scale, allowing for quantitative analysis of user preferences and experiences, making this dataset valuable for research in user-centered design and usability optimization.

  18. Facebook: distribution of global audiences 2024, by age and gender

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

                  Facebook connects the world
    
                  Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
                  as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.
    
  19. 🇵🇭 Wish 107.5 Official YT Channel Comments

    • kaggle.com
    zip
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). 🇵🇭 Wish 107.5 Official YT Channel Comments [Dataset]. https://www.kaggle.com/datasets/bwandowando/wish-107-5-official-yt-channel-comments/data
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 7, 2024
    Authors
    BwandoWando
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    About

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fb0970eff6a6c3bfd22403f3c09c14b4b%2Fmaxresdefault.jpg?generation=1712495392303450&alt=media" alt="">

    From https://wish1075.com/

    Wish 107.5 is an all-hits FM radio station in the Philippines. When it first hit the airwaves in August 2014, it promised to grant your fervent wish of making your radio more than a typical music-box-on-air.

    Wish 107.5 unveiled the first and the only Mobile Radio Booth in the Philippines, now known as the WISH 107.5 Bus. Equipped with state-of-the-art broadcast facilities, it took the traditional radio experience beyond the four-walled booth as it brought music right where most of the listening public are -- streets, roads, and parks.

    With the capabilities it offers, the Wish 107.5 Bus is on the right track in leaving an indelible mark in the music scene. The desire to bring this concept to more audience fuels the station to continue embarking on a journey that would forever change the course of music and radio broadcast history of the Philippines and the World, transforming itself from being a local FM station to becoming a sought-after WISHclusive gateway to the world.

    Links

    Photo

    From https://wish1075.com/

  20. Fake Turing Test

    • kaggle.com
    Updated Jun 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashton Six (2018). Fake Turing Test [Dataset]. https://www.kaggle.com/ashtonsix/fake-turing-test/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ashton Six
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    I invited users to participate in a Real-Life Turing Test [link]. Users were connected to strangers and asked to predict whether they were a human or robot. However, everyone was a human and the scores were randomised, tricking users into believing they were talking with an advanced AI. This dataset includes 2,678 chats from the experiment.

    See the live experiment, code repository, my YouTube & Twitter.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Raad Bin Tareaf (2017). Tweets Dataset - Top 20 most followed users in Twitter social platform [Dataset]. http://doi.org/10.7910/DVN/JBXKFD

Tweets Dataset - Top 20 most followed users in Twitter social platform

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 18, 2017
Dataset provided by
Harvard Dataverse
Authors
Raad Bin Tareaf
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

-This Dataset was gathered by crawling Twitter's REST API using the Python library tweepy 3. This dataset contains the tweets of the 20 most popular twitter users (with the most followers) whereby retweets are neglected. These accounts belong to public people, such as Katy Perry and Barack Obama, platforms, YouTube, Instagram, and television channels shows, e.g., CNN Breaking News and The Ellen Show. -Consequently, the dataset contains a mix of relatively structured tweets, tweets written in a formal and informative manner, and completely unstructured tweets written in a colloquial style. Unfortunately, the geocoordinates were not available for those tweets. - H -This Dataset has been used to generate reserach paper under title "Machine Learning Techniques for Anomalies Detection in Post Arrays". -Crawled attributes are: Author (Twitter User), Content (Tweet), Date_Time, id (Twitter User ID), language (Tweet Langugage), Number_of_Likes, Number_of_Shares. Overall: 52543 tweets of top 20 users in twitter Screen_Name #Tweets Time span (in days) TheEllenShow 3,147 - 662 jimmyfallon 3,123 - 1231 ArianaGrande 3,104 - 613 YouTube 3,077 - 411 KimKardashian 2,939 - 603 katyperry 2,924 - 1,598 selenagomez 2,913 - 2,266 rihanna 2,877 - 1,557 BarackObama 2,863 - 849 britneyspears 2,776 - 1,548 instagram 2,577 - 456 shakira 2,530 - 1,850 Cristiano 2,507 - 2,407 jtimberlake 2,478 - 2,491 ladygaga 2,329 - 894 Twitter 2,290 - 2,593 ddlovato 2,217 - 741 taylorswift13 2,029 - 2,091 justinbieber 2,000 - 664 cnnbrk 1,842 - 183

Search
Clear search
Close search
Google apps
Main menu