CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
-This Dataset was gathered by crawling Twitter's REST API using the Python library tweepy 3. This dataset contains the tweets of the 20 most popular twitter users (with the most followers) whereby retweets are neglected. These accounts belong to public people, such as Katy Perry and Barack Obama, platforms, YouTube, Instagram, and television channels shows, e.g., CNN Breaking News and The Ellen Show. -Consequently, the dataset contains a mix of relatively structured tweets, tweets written in a formal and informative manner, and completely unstructured tweets written in a colloquial style. Unfortunately, the geocoordinates were not available for those tweets. - H -This Dataset has been used to generate reserach paper under title "Machine Learning Techniques for Anomalies Detection in Post Arrays". -Crawled attributes are: Author (Twitter User), Content (Tweet), Date_Time, id (Twitter User ID), language (Tweet Langugage), Number_of_Likes, Number_of_Shares. Overall: 52543 tweets of top 20 users in twitter Screen_Name #Tweets Time span (in days) TheEllenShow 3,147 - 662 jimmyfallon 3,123 - 1231 ArianaGrande 3,104 - 613 YouTube 3,077 - 411 KimKardashian 2,939 - 603 katyperry 2,924 - 1,598 selenagomez 2,913 - 2,266 rihanna 2,877 - 1,557 BarackObama 2,863 - 849 britneyspears 2,776 - 1,548 instagram 2,577 - 456 shakira 2,530 - 1,850 Cristiano 2,507 - 2,407 jtimberlake 2,478 - 2,491 ladygaga 2,329 - 894 Twitter 2,290 - 2,593 ddlovato 2,217 - 741 taylorswift13 2,029 - 2,091 justinbieber 2,000 - 664 cnnbrk 1,842 - 183
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A list of the most popular (top 100 by followers) Instagram, Twitter, YouTube, Twitch, and TikTok users. NB! For YouTube the followers are subscribers and the posts are videos.
Instagram’s most popular post
As of April 2024, the most popular post on Instagram was Lionel Messi and his teammates after winning the 2022 FIFA World Cup with Argentina, posted by the account @leomessi. Messi's post, which racked up over 61 million likes within a day, knocked off the reigning post, which was 'Photo of an Egg'. Originally posted in January 2021, 'Photo of an Egg' surpassed the world’s most popular Instagram post at that time, which was a photo by Kylie Jenner’s daughter totaling 18 million likes.
After several cryptic posts published by the account, World Record Egg revealed itself to be a part of a mental health campaign aimed at the pressures of social media use.
Instagram’s most popular accounts
As of April 2024, the official Instagram account @instagram had the most followers of any account on the platform, with 672 million followers. Portuguese footballer Cristiano Ronaldo (@cristiano) was the most followed individual with 628 million followers, while Selena Gomez (@selenagomez) was the most followed woman on the platform with 429 million. Additionally, Inter Miami CF striker Lionel Messi (@leomessi) had a total of 502 million. Celebrities such as The Rock, Kylie Jenner, and Ariana Grande all had over 380 million followers each.
Instagram influencers
In the United States, the leading content category of Instagram influencers was lifestyle, with 15.25 percent of influencers creating lifestyle content in 2021. Music ranked in second place with 10.96 percent, followed by family with 8.24 percent. Having a large audience can be very lucrative: Instagram influencers in the United States, Canada and the United Kingdom with over 90,000 followers made around 1,221 US dollars per post.
Instagram around the globe
Instagram’s worldwide popularity continues to grow, and India is the leading country in terms of number of users, with over 362.9 million users as of January 2024. The United States had 169.65 million Instagram users and Brazil had 134.6 million users. The social media platform was also very popular in Indonesia and Turkey, with 100.9 and 57.1, respectively. As of January 2024, Instagram was the fourth most popular social network in the world, behind Facebook, YouTube and WhatsApp.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets were collated as part of a Master's dissertation project. Each dataset includes the video titles from Youtube links shared by pro and anti Brexit Twitter users (as discerned using Twitter bio keywords). The datasets that these links are drawn from are also available, and are linked to this dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Twitter User Dataset
This dataset was obtained by crawling Twitter's REST API using the Python library Tweepy 3. The dataset comprises tweets from the 20 most popular Twitter users based on the number of followers, with retweets excluded. These accounts include public figures such as Katy Perry and Barack Obama, platforms like YouTube and Instagram, and television channels such as CNN Breaking News and The Ellen Show. The dataset presents a diverse collection of tweets, ranging from… See the full description on the dataset page: https://huggingface.co/datasets/haydenbanz/Tweets_Dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set has:- Comments manually collected from a YouTube video containing the 5G conspiracy theory articulated as legiitmate truth - Number of followers and followed Twitter users found on posts that shared the aforementioned video- Number of posts identified on Facebook sharing the same video and their respective number of followers
Motivation
The rise of online media has enabled users to choose various unethical and artificial ways of gaining social growth to boost their credibility (number of followers/retweets/views/likes/subscriptions) within a short time period. In this work, we present ABOME, a novel data repository consisting of datasets collected from multiple platforms for the analysis of blackmarket-driven collusive activities, which are prevalent but often unnoticed in online media. ABOME contains data related to tweets and users on Twitter, YouTube videos, YouTube channels. We believe ABOME is a unique data repository that one can leverage to identify and analyze blackmarket based temporal fraudulent activities in online media as well as the network dynamics.
License
Creative Commons License.
Description of the dataset
- Historical Data
We collected the metadata of each entity present in the historical data
Twitter:
We collected the following fields for retweets and followers on Twitter:
user_details
: A JSON object representing a Twitter user.
tweet_details
: A JSON object representing a tweet.
tweet_retweets
: A JSON list of tweet objects representing the most recent 100 retweets of a given tweet.
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object↩︎
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object↩︎
YouTube:
We collected the following fields for YouTube likes and comments:
is_family_friendly:
Whether the video is marked as family friendly or not.
genre:
Genre of the video.
duration:
Duration of the video in ISO 8601 format (duration type). This format is generally used when the duration denotes the amount of intervening time in a time interval.
description:
Description of the video.
upload_date:
Date that the video was uploaded.
is_paid:
Whether the video is paid or not.
is_unlisted:
The privacy status of the video, i.e., whether the video is unlisted or not. Here, the flag unlisted indicates that the video can only be accessed by people who have a direct link to it.
statistics:
A JSON object containing the number of dislikes, views and likes for the video.
comments:
A list of comments for the video. Each element in the list is a JSON object of the text (the comment text) and time (the time when the comment was posted).
We collected the following fields for YouTube channels:
channel_description:
Description of the channel.
hidden_subscriber_count:
Total number of hidden subscribers of the channel.
published_at:
Time when the channel was created. The time is specified in ISO 8601 format (YYYY-MM-DDThh:mm:ss.sZ).
video_count:
Total number of videos uploaded to the channel.
subscriber_count:
Total number of subscribers of the channel.
view_count:
The number of times the channel has been viewed.
kind:
The API resource type (e.g., youtube#channel for YouTube channels).
country:
The country the channel is associated with.
comment_count:
Total number of comments the channel has received.
etag:
The ETag of the channel which is an HTTP header used for web browser cache validation.
The historical data is stored in five directories named according to the type of data inside it. Each directory contains json files corresponding to the data described above.
- Time-series Data
We collect the following time-series data for retweets and followers on Twitter:
user_timeline
: This is a JSON list of tweet objects in the user’s timeline, which consists of the tweets posted, retweeted and quoted by the user. The file created at each time interval contains the new tweets posted by the user during each time interval.
user_followers
: This is a JSON file containing the user ids of all the followers of a user that were added or removed from the follower list during each time interval.
user_followees
: This is a JSON file consisting of the user ids of all the users followed by a user, i.e., the followees of a user, that were added or removed from the followee list during each time interval.
tweet_details
: This is a JSON object representing a given tweet, collected after every time interval.
tweet_retweets
: This is a JSON list of tweet objects representing the most recent 100 retweets of a given tweet, collected after every time interval.
The time-series data is stored in directories named according to the timestamp of the collection time. Each directory contains sub-directories corresponding to the data described above.
Data Anonymization
The data is anonymized by removing all Personally Identifiable Information (PII) and generating pseud-IDs corresponding to the original IDs. A consistent mapping between the original and pseudo-IDs is maintained to maintain the integrity of the data.
Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.
The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
How popular is Instagram?
Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
Who uses Instagram?
Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
Celebrity influencers on Instagram
Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The presidential elections in the United States on November 3rd 2020 caused extensive discussions on social media. A part of the content on US elections is organic, coming from users discussing their opinions on the candidates, political positions, or relevant content presented on television. Another significant part originates from organized campaigns, both official, including communication campaigns and dissemination, or unofficial, including astroturfing and targeting manipulation of the electorate.
In this study, we obtain approximately 19.8M tweets from 4.5M users, based on prevalent hashtags related to the 2020 US election. From these, we mined 28.343 YouTube links tweeted and obtained likes, dislikes and comments of these videos. In this paper, we study the connection between the two social networks. We employ an array of techniques, including volume analysis, exploring the retweet graph, sentiment and graph analysis on the communities formed in YouTube and Twitter. Furthermore, we propose a method to combine the results of community detection on the two social networks and measure the differences between them.
Particularly, we study the daily traffic per prevalent hashtags, plot the retweet graph from July to November 2020, highlight the two main entities (‘Biden’ and ‘Trump’) and show how the discussion around those entities grows in the period closer to the elections. Additionally, we perform a sentiment analysis of both the Twitter corpus and the YouTube comments in tweeted videos. We found that 35,2% o the users contained in our Twitter dataset express positive sentiment towards Trump and 28% express positive sentiment towards Biden; while 18% of the users in our YouTube dataset express positive sentiment towards Trump and 12% express positive sentiment towards Biden. Finally, we link the Twitter Retweet graph with the YouTube comment graph using tweeted video links. We measure their similarity and differences and show the interactions and the correlation between the largest communities on YouTube and Twitter.
This dataset provides comprehensive social media profile links discovered through real-time web search. It includes profiles from major social networks like Facebook, TikTok, Instagram, Twitter, LinkedIn, Youtube, Pinterest, Github and more. The data is gathered through intelligent search algorithms and pattern matching. Users can leverage this dataset for social media research, influencer discovery, social presence analysis, and social media marketing. The API enables efficient discovery of social profiles across multiple platforms. The dataset is delivered in a JSON format via REST API.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.
Instagram’s Global Audience
As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
Who is winning over the generations?
Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 10 most popular hashtags in our dataset.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Network monitoring and analysis of consumption behavior represents an important aspect for network operators allowing to obtain vital information about consumption trends in order to offer new data plans aimed at specific users and obtain an adequate perspective of the network. Over-the-top (OTT) media and communications services and applications are shifting the Internet consumption by increasing the traffic generation over the different available networks. OTT refers to applications that deliver audio, video, and other media over the Internet by leveraging the infrastructure deployed by network operators but without their involvement in the control or distribution of the content and are known by their large consumption of network resources.
This dataset contains 1581 instances and 131 attributes on a single file. Each instance represents a user’s consumption profile which holds summarized information about the consumption behavior of the user related to the 29 OTT applications identified in the different IP flows captured in order to create the dataset
The OTT applications that the users interacted with during the capture experiment and were stored on the dataset are: Amazon, Apple store, Apple Icloud, Apple Itunes, Deezer, Dropbox, EasyTaxi, Ebay, Facebook, Gmail, Google suite, Google Maps, Browsing (HTTP, HTTP_Connect, HTTP_Download, HTTP_Proxy), Instagram, LastFM, Microsoft One Drive (MS_One_Drive), Facebook Messenger (MSN), Netflix, Skype, Spotify, Teamspeak, Teamviewer, Twitch, Twitter, Waze, Whatsapp, Wikipedia, Yahoo and Youtube.
Each application has 4 different types of attributes (quantity of generated flows, mean duration of the flows, average size of the packets exchanged on the flows and the mean bytes per second on the flows). These attributes summarizes the interaction that the user had with the respective OTT application in terms of consumption. Furthermore, the dataset contains the user’s IP address in network and decimal format which are used as user identifiers. Finally the User Group attribute represents the objective class (high consumption, medium consumption and low consumption) in which a user is classified considering his/her OTT consumption behavior. All of this information gives a total of 131 attributes.
For further information you can read and please cite the following papers:
Springer: https://link.springer.com/chapter/10.1007/978-3-319-95168-3_37
IEEExplore: https://ieeexplore.ieee.org/document/8845576
The structure of the attributes and its definition is presented below:
Source.Decimal: This attribute holds the user’s IP address in decimal format and it is mainly used as a user identifier.
Source.IP: This attribute holds the user’s IP address in network format (e.g., 192.168.14.35) and as in the previous case its main function is to work as a user identifier.
Application-Name.Flows: This type of attributes hold the information about the quantity of IP flows that a user generated toward an OTT application. As was mentioned before each application has a group of 4 attributes that describe the interaction of the user with a specific OTT application (an example for this case would be Netflix.Flows or Facebook.Flows).
Application-Name.Flow.Duration.Mean: This type of attributes hold the information related to the mean duration (time) of the flows generated by the user towards a specific OTT application, measured in microseconds. Examples of how this attributes are stored in the dataset are: Amazon.Flow.Duration.Mean or Instagram.Flow.Duration.Mean.
Application-Name.AVG.Packet.Size: This type of attributes hold the average size of the IP packets that were exchanged in all the flows generated by the user towards a specific OTT application, measured in bytes. It is important to notice that this size is focused on the packet’s header only. Examples of how this attribute are presented on the dataset are: Google_Maps.AVG.Packet.Size or Spotify.AVG.Packet.Size.
Application-Name.Flow.Bytes.Per.Sec: This type of attributes hold the mean number of bytes per second that were exchanged in the flows generated by the user towards a specific OTT application. Examples of this kind of attributes in the dataset are: Deezer.Flow.Bytes.Per.Sec or Skype.Flow.Bytes.Per.Sec.
User.Group: This type of attribute represents the objective class of the dataset i.e., the different groups that the users are classified in according to their OTT consumption behavior...
This statistic shows a ranking of the estimated number of Twitter users in 2020 in Africa, differentiated by country. The user numbers have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in more than 150 countries and regions worldwide. All input data are sourced from international institutions, national statistical offices, and trade associations. All data has been are processed to generate comparable datasets (see supplementary notes under details for more information).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Social media platforms use short, highly engaging videos to catch users’ attention. While the short-form video feeds popularized by TikTok are rapidly spreading to other platforms, we do not yet understand their impact on cognitive functions. We conducted a between-subjects experiment (𝑁 = 60) investigating the impact of engaging with TikTok, Twitter, and YouTube while performing a Prospective Memory task (i.e., executing a previously planned action). The study required participants to remember intentions over interruptions. We found that the TikTok condition significantly degraded the users’ performance in this task. As none of the other conditions (Twitter, YouTube, no activity) had a similar effect, our results indicate that the combination of short videos and rapid context-switching impairs intention recall and execution. We contribute a quantified understanding of the effect of social media feed format on Prospective Memory and outline consequences for media technology designers not to harm the users’ memory and wellbeing. Description of the Dataset Data frame: The ./data/rt.csv provides the data frame of reaction times. The ./data/acc.csv provides the data frame of reaction accuracy scores. The ./data/q.csv provides the data frame collected from questionnaires. The ./data/ddm.csv is the learned DDM features using ./appendix2_ddm_fitting.ipynb, which is then used in ./3.ddm_anova.ipynb. Figures: All figures appeared in the paper are placed in ./figures and can be reproduced using *_vis.ipynb files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises 2,271 entries and provides insights into user interface (UI) and user experience (UX) preferences across various digital platforms. Key information includes user demographics (Name, Age, Gender) and platform preferences (e.g., Twitter, YouTube, Facebook, Website). It captures user experiences and satisfaction levels with various UI/UX elements such as color schemes, visual hierarchy, typography, multimedia usage, and layout design. The dataset also includes evaluations of mobile responsiveness, call-to-action buttons, form usability, feedback/error messages, loading speed, personalization, accessibility, and interactions (like scrolling behavior and gestures). Each UI/UX component is rated on a scale, allowing for quantitative analysis of user preferences and experiences, making this dataset valuable for research in user-centered design and usability optimization.
As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.
Facebook connects the world
Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fb0970eff6a6c3bfd22403f3c09c14b4b%2Fmaxresdefault.jpg?generation=1712495392303450&alt=media" alt="">
Wish 107.5 is an all-hits FM radio station in the Philippines. When it first hit the airwaves in August 2014, it promised to grant your fervent wish of making your radio more than a typical music-box-on-air.
Wish 107.5 unveiled the first and the only Mobile Radio Booth in the Philippines, now known as the WISH 107.5 Bus. Equipped with state-of-the-art broadcast facilities, it took the traditional radio experience beyond the four-walled booth as it brought music right where most of the listening public are -- streets, roads, and parks.
With the capabilities it offers, the Wish 107.5 Bus is on the right track in leaving an indelible mark in the music scene. The desire to bring this concept to more audience fuels the station to continue embarking on a journey that would forever change the course of music and radio broadcast history of the Philippines and the World, transforming itself from being a local FM station to becoming a sought-after WISHclusive gateway to the world.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I invited users to participate in a Real-Life Turing Test [link]. Users were connected to strangers and asked to predict whether they were a human or robot. However, everyone was a human and the scores were randomised, tricking users into believing they were talking with an advanced AI. This dataset includes 2,678 chats from the experiment.
See the live experiment, code repository, my YouTube & Twitter.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
-This Dataset was gathered by crawling Twitter's REST API using the Python library tweepy 3. This dataset contains the tweets of the 20 most popular twitter users (with the most followers) whereby retweets are neglected. These accounts belong to public people, such as Katy Perry and Barack Obama, platforms, YouTube, Instagram, and television channels shows, e.g., CNN Breaking News and The Ellen Show. -Consequently, the dataset contains a mix of relatively structured tweets, tweets written in a formal and informative manner, and completely unstructured tweets written in a colloquial style. Unfortunately, the geocoordinates were not available for those tweets. - H -This Dataset has been used to generate reserach paper under title "Machine Learning Techniques for Anomalies Detection in Post Arrays". -Crawled attributes are: Author (Twitter User), Content (Tweet), Date_Time, id (Twitter User ID), language (Tweet Langugage), Number_of_Likes, Number_of_Shares. Overall: 52543 tweets of top 20 users in twitter Screen_Name #Tweets Time span (in days) TheEllenShow 3,147 - 662 jimmyfallon 3,123 - 1231 ArianaGrande 3,104 - 613 YouTube 3,077 - 411 KimKardashian 2,939 - 603 katyperry 2,924 - 1,598 selenagomez 2,913 - 2,266 rihanna 2,877 - 1,557 BarackObama 2,863 - 849 britneyspears 2,776 - 1,548 instagram 2,577 - 456 shakira 2,530 - 1,850 Cristiano 2,507 - 2,407 jtimberlake 2,478 - 2,491 ladygaga 2,329 - 894 Twitter 2,290 - 2,593 ddlovato 2,217 - 741 taylorswift13 2,029 - 2,091 justinbieber 2,000 - 664 cnnbrk 1,842 - 183