Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
About Dataset
This dataset accompanies the paper RoadSocial: A Diverse Dataset and Benchmark for Road Event Understanding from Social Video Narratives. RoadSocial is a large-scale, diverse VideoQA dataset designed for generic road event understanding from social media narratives. It can help enhance the road event comprehension capabilities of general-purpose Video LLMs and improve their performance in traffic scene understanding, planning, and other autonomous vehicle (AV) related… See the full description on the dataset page: https://huggingface.co/datasets/chiragp26/RoadSocial.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This package contains replication data for: "Doctors’ and Nurses’ Social Media Ads Reduced Holiday Travel and COVID-19 infections: A cluster randomized controlled trial in 13 States". It also contains the IRB protocol documents. It contains 9 datasets: -clean_cases.csv: zip level Covid-19 cases -county_covariates.dta: county level covariates -county_pop2019.dta countains county population in 2019 -Election2020.dta: election data from 2020 -fb_movement_data.dta: Facebook mobility data -randomized_sample_thanksgiving.xlsx: zip and county treatment status during the Thanksgiving campaign -randomized_sample_christmas.xlsx: zip and county treatment status during the Christmas campaign -us-counties.csv: county level Covid-19 data -randomized_zip.csv: treatment randomizations generated by zip_randomization.R for Randomization Inference The code, produced in R, contains both cleaning and analysis code. For further details on the data or how to run the code, please see the readme file. The abstract of the paper is as follows: During the COVID-19 epidemic, many health professionals started using mass communication on social media to relay critical information and persuade individuals to adopt preventative health behaviors. Our group of clinicians and nurses developed and recorded short video messages to encourage viewers to stay home for the Thanksgiving and Christmas Holidays. We then conducted a two-stage clustered randomized controlled trial in 820 counties (covering 13 States) in the United States of a large-scale Facebook ad campaign disseminating these messages. In the first level of randomization, we randomly divided the counties into two groups: high intensity and low intensity. In the second level, we randomly assigned zip codes to either treatment or control such that 75% of zip codes in high intensity counties received the treatment, while 25% of zip codes in low intensity counties received the treatment. In each treated zip code, we sent the ad to as many Facebook subscribers as possible (11,954,109 users received at least one ad at Thanksgiving and 23,302,290 users received at least one ad at Christmas). The first primary outcome was aggregate holiday travel, measured using mobile phone location data, available at the county level: we find that average distance travelled in high-intensity counties decreased by -0.993 percentage points (95% CI -1.616, -0.371, p-value 0.002) the three days before each holiday. The second primary outcome was COVID-19 infection at the zip-code level: COVID-19 infections recorded in the two-week period starting five days post-holiday declined by 3.5 percent (adjusted 95% CI [-6.2 percent, -0.7 percent], p-value 0.013) in intervention zip codes compared to control zip codes.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
TikTok-10M Dataset
Dataset Description
TikTok-10M is a large-scale dataset containing 10 million short-form posts from TikTok, designed for video understanding, multimodal learning, and social media content analysis. The dataset was curated to bridge the gap between academic video datasets and actual user-generated content, providing researchers with authentic patterns and characteristics of modern short-form video content that dominates social media platforms.… See the full description on the dataset page: https://huggingface.co/datasets/The-data-company/TikTok-10M.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Study how YouTube videos become viral or, more in general, how they evolve in terms of views, likes and subscriptions is a topic of interest in many disciplines. With this dataset you can study such phenomena, with statistics about 1 million YouTube videos. The information was collected in 2013 when YouTube was exposing the data publicly: they removed this functionality in the years and now it's possible to have such statistics only to the owner of the video. This makes this dataset unique.
This Dataset has been generated with YOUStatAnalyzer, a tool developed by myself (Mattia Zeni) when I was working for CREATE-NET (www.create-net.org) within the framework of the CONGAS FP7 project (http://www.congas-project.eu). For the project we needed to collect and analyse the dynamics of YouTube videos popularity. The dataset contains statistics of more than 1 million Youtube videos, chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).
The motivation that led us to the development of the YOUStatAnalyser data collection tool and the creation of this dataset is that there's an active research community working on the interplay among user individual preferences, social dynamics, advertising mechanisms and a common problem is the lack of open large-scale datasets. At the same time, no tool was present at that time. Today, YouTube removed the possibility to visualize these data on each video's page, making this dataset unique.
When using our dataset for research purposes, please cite it as:
@INPROCEEDINGS{YOUStatAnalyzer,
author={Mattia Zeni and Daniele Miorandi and Francesco {De Pellegrini}},
title = {{YOUStatAnalyzer}: a Tool for Analysing the Dynamics of {YouTube} Content Popularity},
booktitle = {Proc. 7th International Conference on Performance Evaluation Methodologies and Tools
(Valuetools, Torino, Italy, December 2013)},
address = {Torino, Italy},
year = {2013}
}
The dataset contains statistics and metadata of 1 million YouTube videos, collected in 2013. The videos have been chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).
The structure of a dataset is the following:
{
u'_id': u'9eToPjUnwmU',
u'title': u'Traitor Compilation # 1 (Trouble ...',
u'description': u'A traitor compilation by one are ...',
u'category': u'Games',
u'commentsNumber': u'6',
u'publishedDate': u'2012-10-09T23:42:12.000Z',
u'author': u'ServilityGaming',
u'duration': u'208',
u'type': u'video/3gpp',
u'relatedVideos': [u'acjHy7oPmls', u'EhW2LbCjm7c', u'UUKigFAQLMA', ...],
u'accessControl': {
u'comment': {u'permission': u'allowed'},
u'list': {u'permission': u'allowed'},
u'videoRespond': {u'permission': u'moderated'},
u'rate': {u'permission': u'allowed'},
u'syndicate': {u'permission': u'allowed'},
u'embed': {u'permission': u'allowed'},
u'commentVote': {u'permission': u'allowed'},
u'autoPlay': {u'permission': u'allowed'}
},
u'views': {
u'cumulative': {
u'data': [15.0, 25.0, 26.0, 26.0, ...]
},
u'daily': {
u'data': [15.0, 10.0, 1.0, 0.0, ..]
}
},
u'shares': {
u'cumulative': {
u'data': [0.0, 0.0, 0.0, 0.0, ...]
},
u'daily': {
u'data': [0.0, 0.0, 0.0, 0.0, ...]
}
},
u'watchtime': {
u'cumulative': {
u'data': [22.5666666667, 36.5166666667, 36.7, 36.7, ...]
},
u'daily': {
u'data': [22.5666666667, 13.95, 0.166666666667, 0.0, ...]
}
},
u'subscribers': {
u'cumulative': {
u'data': [0.0, 0.0, 0.0, 0.0, ...]
},
u'daily': {
u'data': [-1.0, 0.0, 0.0, 0.0, ...]
}
},
u'day': {
u'data': [1349740800000.0, 1349827200000.0, 1349913600000.0, 1350000000000.0, ...]
}
}
From the structure above is possible to see which fields an entry in the dataset has. It is possible to divide them into 2 sections:
1) Video Information.
_id -> Corresponding to the video ID and to the unique identifier of an entry in the database.
title -> Te video's title.
description -> The video's description.
category -> The YouTube category the video is inserted in.
commentsNumber -> The number of comments posted by users.
publishedDate -> The date the video has been published.
author -> The author of the video.
duration -> The video duration in seconds.
type -> The encoding type of the video.
relatedVideos -> A list of related videos.
accessControl -> A list of access policies for different aspects related to the video.
2) Video Statistics.
Each video can have 4 different statistics variables: views, shares, subscribers and watchtime. Recent videos have all of them while older video can have only the 'views' variable. Each variable has 2 dimensions, daily and cumulative.
`views -> number of views collected by the vi...
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem to social and streaming media applications. Efficient and accurate video quality predictors are needed to monitor and guide the processing of billions of shared, often imperfect, user-generated content (UGC). Unfortunately, current NR models are limited in their prediction capabilities on real-world, "in-the-wild" UGC video data. To advance progress on this problem, we created the largest (by far) subjective video quality dataset, containing 39, 000 real-world distorted videos and 117, 000 space-time localized video patches ("v-patches"), and 5.5M human perceptual quality annotations. Using this, we created two unique NR-VQA models: (a) a local-to-global region-based NR VQA architecture (called PVQ) that learns to predict global video quality and achieves state-of-the-art performance on 3 UGC datasets, and (b) a first-of-a-kind space-time video quality mapping engine (called PVQ Mapper) that helps localize and visualize perceptual distortions in space and time. We will make the new database and prediction models available immediately following the review process.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Location-Based Services (LBS) have been prosperous owing to technological advancements of smart devices. Analyzing location based user generated data is a helpful way to understand human mobility patterns, further fueling applications such as recommender systems and urban computing. In this data descriptor, we introduce a dataset collected by LBSLab, a smartphone-based system implemented as a mini-program in the WeChat app, designed for large scale data collection from the smartphones of the participants with their informed consent. We provide activity data of multiple types including logins, profile viewing, weather checking, and check-ins with location information (latitude and longitude), POI and mood indicated, collected from 467 users over a duration of 11 days. We present some basic data analysis and expect the reuse of the data will allow researchers to better understand user behaviors of LBSs, human mobility, and also temporal and spatial characteristics of people’s mood.For further information about the LBSLab system, you can check out our position paper here: https://user.informatik.uni-goettingen.de/~ychen/papers/LBSLab-UbiComp18.pdfAnd also the Youtube video here: https://www.youtube.com/watch?v=m8r-1jqvYWc
As of early 2025, Saudi Arabia had the highest social media penetration rate globally out of selected countries and territories, with a whopping 102 percent. UAE and South Korea followed, with 96 percent and 94 percent of active usage reach, respectively. Kenya, Ghana, and Nigeria had some of the lowest social network penetration rates in the world, with less than 26 percent of the population accessing social media in each country. How many people use social media? Although the top three countries with the highest social media penetration rates globally were in Eastern and Southwestern Asia in 2023, the region with the greatest social media reach was Northern Europe with 83.6 percent, followed by Western Europe with 83.3 percent and Southern Europe with 76.7 percent. In 2022, more than 4.59 billion people reported using social media, and this number is projected to reach almost six billion by 2027. Facebook: the most popular social network Meta’s Facebook, the social media giant and the first platform to reach this kind of scale, was the leading social network as of October 2023 with more than three billion global monthly active users (MAU). Additionally, Meta owns four of the biggest social media platforms, all with more than one billion MAU each: Facebook, WhatsApp, Instagram, and Messenger. As of January 2023, India was home to Facebook’s largest audience with more than 300 million MAU, followed by the United States with 175 million MAU.
Which county has the most Facebook users?
There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.
Facebook – the most used social media
Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.
Facebook usage by device
As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.
https://egobody.ethz.ch/https://egobody.ethz.ch/
EgoBody is a novel large-scale dataset for social interactions in complex 3D scenes.
According to our latest research, the global content analytics market size reached USD 7.2 billion in 2024, demonstrating robust momentum driven by the rapid digitization of content across industries. The market is projected to expand at a CAGR of 18.4% from 2025 to 2033, with the market size anticipated to reach USD 35.8 billion by 2033. This impressive growth trajectory is primarily fueled by the increasing demand for actionable insights from unstructured data, advancements in artificial intelligence and machine learning, and the proliferation of digital channels that generate massive volumes of content. As organizations strive to harness the power of data-driven decision-making, content analytics solutions have become indispensable across sectors.
One of the principal growth factors propelling the content analytics market is the exponential surge in digital content creation and consumption. With enterprises and consumers generating vast amounts of data through emails, social media, websites, and multimedia platforms, the need to analyze and extract meaningful patterns from this content has never been greater. Content analytics tools enable organizations to derive valuable business intelligence, optimize marketing strategies, enhance customer experiences, and ensure regulatory compliance. This trend is further amplified by the integration of advanced technologies such as natural language processing (NLP), sentiment analysis, and machine learning, which facilitate deeper and more nuanced understanding of text, audio, and video content. As a result, businesses are increasingly investing in content analytics to gain a competitive edge, streamline operations, and foster innovation.
Another significant factor driving market growth is the rising adoption of cloud-based content analytics solutions. Cloud deployment offers unparalleled scalability, flexibility, and cost-efficiency, making it an attractive choice for organizations of all sizes. The cloud model enables seamless integration with existing IT infrastructure, real-time access to analytics, and the ability to handle large-scale data processing without the need for significant upfront investments in hardware. Additionally, the shift towards remote and hybrid work models has accelerated the demand for cloud-based analytics tools that facilitate collaboration and decision-making across geographically dispersed teams. This transition is particularly pronounced among small and medium enterprises (SMEs), which benefit from the lower total cost of ownership and faster deployment cycles offered by cloud solutions.
The growing emphasis on customer-centric strategies and personalized experiences is also shaping the content analytics market landscape. Organizations across sectors such as retail, BFSI, healthcare, and media are leveraging content analytics to gain deeper insights into customer preferences, behaviors, and feedback. By analyzing data from multiple touchpoints—including social media, customer reviews, and call center transcripts—companies can tailor their offerings, improve engagement, and drive customer loyalty. Furthermore, regulatory requirements around data privacy and security are prompting enterprises to adopt robust analytics solutions that ensure compliance while maximizing the value of their content assets. The convergence of these factors is expected to sustain the strong growth trajectory of the content analytics market in the coming years.
From a regional perspective, North America continues to dominate the global content analytics market, accounting for the largest share in 2024. The region's leadership is attributed to the presence of major technology players, high digital adoption rates, and a mature analytics ecosystem. However, the Asia Pacific region is emerging as the fastest-growing market, driven by rapid digital transformation, expanding internet penetration, and increasing investments in big data and analytics technologies. Europe, Latin America, and the Middle East & Africa are also witnessing steady growth, fueled by rising awareness of the benefits of content analytics and the need to enhance business agility in a dynamic digital landscape.
As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.
Instagram’s Global Audience
As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
Who is winning over the generations?
Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global AI-Generated Short-Form Video Script market size reached USD 1.23 billion in 2024, driven by the explosive adoption of AI-powered content creation tools across digital platforms. The market is expected to grow at a robust CAGR of 28.7% from 2025 to 2033, reaching a forecasted value of USD 10.89 billion by 2033. This remarkable growth is fueled by the increasing demand for scalable, personalized, and rapid video content generation solutions, especially in marketing, entertainment, and social media sectors. As per our latest research, the convergence of artificial intelligence and digital content strategies is fundamentally transforming how brands, creators, and enterprises engage with their audiences through short-form video storytelling.
The primary growth driver for the AI-Generated Short-Form Video Script market is the surging need for high-volume, engaging video content in the digital age. With platforms like TikTok, Instagram Reels, and YouTube Shorts witnessing unprecedented user engagement, brands and content creators are under constant pressure to produce fresh, relevant, and captivating short-form videos. AI-driven script generation tools enable rapid ideation, scriptwriting, and content adaptation, significantly reducing turnaround times and operational costs. This automation empowers even small teams or individual creators to maintain a consistent publishing cadence, ensuring they remain competitive in an attention-driven digital landscape. Moreover, AI models trained on vast datasets can analyze trending topics, audience preferences, and platform algorithms, generating scripts that are optimized for virality and audience retention.
Another significant growth factor is the integration of AI-generated scripts into broader marketing, advertising, and e-commerce strategies. Businesses across industries are leveraging AI-powered video scripts to personalize messaging at scale, segmenting audiences with tailored narratives that drive conversion and engagement. For instance, e-commerce brands use AI-generated scripts to create product showcases, explainer videos, and customer testimonials that resonate with specific demographics or shopping behaviors. In the education sector, AI tools facilitate the production of micro-learning content, enabling institutions and ed-tech companies to deliver bite-sized, engaging lessons that cater to diverse learning styles. The entertainment industry, too, is capitalizing on AI-generated scripts to fuel creativity and experiment with new storytelling formats, reducing the reliance on human writers for repetitive or formulaic content.
Technological advancements in natural language processing (NLP), generative AI models, and cloud computing further accelerate market expansion. The evolution of large language models, such as GPT-4 and beyond, has significantly enhanced the quality, coherence, and creativity of AI-generated scripts, making them virtually indistinguishable from human-written content. Cloud-based deployment models ensure that these sophisticated tools are accessible to users worldwide, regardless of their technical infrastructure. Additionally, API integrations and user-friendly interfaces have democratized access to AI-driven scriptwriting, allowing enterprises, content creators, and even educational institutions to seamlessly incorporate AI-generated scripts into their workflows. As AI models continue to evolve, we anticipate further improvements in contextual understanding, multilingual support, and creative diversity, broadening the market’s appeal across global regions and industry verticals.
From a regional perspective, North America currently leads the AI-Generated Short-Form Video Script market, accounting for the largest revenue share in 2024. The region’s dominance is attributed to the presence of major technology companies, a vibrant digital content ecosystem, and early adoption of AI-driven marketing solutions. Asia Pacific is emerging as the fastest-growing region, propelled by the rapid proliferation of mobile internet, a massive base of content creators, and the meteoric rise of short-form video platforms in countries like China and India. Europe follows closely, driven by robust investments in creative industries and digital transformation initiatives. Latin America and the Middle East & Africa are also witnessing steady growth, supported by increasing digitalization and the expansion of social media user bases. The regional landscape is expected to evol
Which county has the most Facebook users? There are more than 383 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country, then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 196.9 million, 122.3 million, and 111.65 million Facebook users respectively. Facebook – the most used social media Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3.5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising. Facebook usage by device As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
About Dataset
This dataset accompanies the paper RoadSocial: A Diverse Dataset and Benchmark for Road Event Understanding from Social Video Narratives. RoadSocial is a large-scale, diverse VideoQA dataset designed for generic road event understanding from social media narratives. It can help enhance the road event comprehension capabilities of general-purpose Video LLMs and improve their performance in traffic scene understanding, planning, and other autonomous vehicle (AV) related… See the full description on the dataset page: https://huggingface.co/datasets/chiragp26/RoadSocial.