By VISHWANATH SESHAGIRI [source]
This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.
To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released
Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.
Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.
Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types
If you use this dataset in your research, please credit the original authors.
License
Unknown License - Please check the dataset description for more information.
File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
YouTube was launched in 2005. It was founded by three PayPal employees: Chad Hurley, Steve Chen, and Jawed Karim, who ran the company from an office above a small restaurant in San Mateo. The first...
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
By VISHWANATH SESHAGIRI [source]
The YouTube Video and Channel Metadata dataset is a comprehensive collection of data related to YouTube videos and channels. It consists of various features and statistics that provide insights into the performance and engagement of videos, as well as the overall popularity and success of channels.
The dataset includes both direct features, such as total views, channel elapsed time, channel ID, video category ID, channel view count, likes per subscriber, dislikes per subscriber, comments per subscriber, and more. Additionally, there are indirect features derived from YouTube's API that provide additional metrics for analysis.
One important aspect covered in this dataset is the ratio between certain metrics. For example: - The totalviews/channelelapsedtime ratio represents the average number of views a video has received relative to the elapsed time since the channel was created. - The likes/dislikes ratio indicates the proportion of likes on a video compared to dislikes. - The views/subscribers ratio showcases how engaged subscribers are by measuring the number of views relative to the number of subscribers.
Other metrics explored in this dataset include comments/views ratio (representing viewer engagement), dislikes/views ratio (measuring viewer sentiment), comments/subscriber ratio (indicating community participation), likes/subscriber ratio (reflecting audience loyalty), dislikes/subscriber ratio (highlighting dissatisfaction levels), total number of subscribers for a channel (subscriberCount), total views on a channel (channelViewCount), total number of comments on a channel (channelCommentCount), among others.
By analyzing these features and statistics within this dataset, researchers or data analysts can gain valuable insights into various aspects related to YouTube videos and channels. Furthermore, it may be possible to build statistical relationships between videos based on their performance characteristics or even develop topic trees based on similarities between different content categories. This dataset serves as an excellent resource for studying YouTube's ecosystem comprehensively.
For accessing additional resources related to this dataset or exploring code repositories associated with it, users can refer to the provided GitHub repository
Introduction:
Step 1: Understanding the Dataset Start by familiarizing yourself with the columns in the dataset. Here are some key features to pay attention to:
- totalviews/channelelapsedtime: The ratio of total views of a video to the elapsed time of the channel.
- channelViewCount: The total number of views on the channel.
- likes/subscriber: The ratio of likes on a video to the number of subscribers of the channel.
- views/subscribers: The ratio of views on a video to the number of subscribers of the channel.
- subscriberCount: The total number of subscribers for a channel.
- dislikes/views: The ratio of dislikes on a video to its total views.
- comments/subscriber: The ratio comments on a video receive per subscriber count.
Step 2: Determining Data Analysis Objectives Define your objectives or research questions before diving into data analysis using this dataset. For example, you may want to explore relationships between viewership, engagement metrics, and various attributes such as category ID or elapsed time.
Step 3: Analyzing Relationships between Variables Use statistical techniques like correlation analysis or visualization tools like scatter plots, bar graphs, or heatmaps to understand relationships between variables in this dataset.
For example: - Plotting totalviews/channelelapsedtime against channelViewCount can help identify patterns between overall video popularity and channels' view count growth over time. - Comparing likes/dislikes with comments/views can give insights into viewer engagement levels across different videos.
Step 4: Building Machine Learning Models (Optional) If your objective includes predictive analysis or building machine learning models, select relevant features as predictors and the target variable (e.g., totalviews/channelelapsedtime) for training and evaluation.
You can use various algorithms such as linear regression, decision trees, or neural networks to predict video performance or channel growth based on available attributes.
Step 5: Evaluating Model Performance Assess the predictive model's performance using appropriate evaluation metrics like mean square...
In 2021, YouTube's user base in the United Kingdom amounts to approximately ***** million users. The number of YouTube users in the United Kingdom is projected to reach ***** million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Name: 2023 YouTube Most Viewed Top600
Description: This dataset, titled "2023YouTubeMostViewed_Top600", comprises a curated selection of the top 600 YouTube videos based on view count, specifically from the year 2023. Each entry in the dataset represents a unique video, encompassing several key metrics:
It's important to note that while these videos are among the most viewed as of the data retrieval date, the landscape of YouTube is dynamic. View counts are continually changing, and what constitutes the 'most viewed' can fluctuate. Thus, the dataset should be seen as a snapshot of popularity and viewer engagement during a specific period in 2023, rather than an absolute ranking. This dataset is invaluable for analysis of trending content, viewer preferences, and video engagement metrics on YouTube for the year 2023.
Note: Ethically mined data from YouTube
In 2021, YouTube's user base in the United States amounts to approximately ****** million users. The number of YouTube users in the United States is projected to reach ****** million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
YouTube is an American online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second most visited website, after Google Search. YouTube has more than 2.5 billion monthly users who collectively watch more than one billion hours of videos each day. As of May 2019, videos were being uploaded at a rate of more than 500 hours of content per minute.
Youtube is very much used to influence, educate, free university (for me also) people (the users followers) in a particular way for a specific issue - which can impact the order in some ways.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is web-scraped from popular short video platforms like YouTube Shorts, TikTok, and Instagram Reels. It captures user interaction data, including views, likes, comments, shares, and watch duration, along with multimodal features from video content like text (titles, descriptions), image (visual characteristics), and audio (sound properties). The data has been processed and flattened into a structured CSV format with 17,654 Rows.
As of February 2025, India was the country with the largest YouTube audience by far, with approximately 491 million users engaging with the popular social video platform. The United States followed, with around 253 million YouTube viewers. Brazil came in third, with 144 million users watching content on YouTube. The United Kingdom saw around 54.8 million internet users engaging with the platform in the examined period. What country has the highest percentage of YouTube users? In July 2024, the United Arab Emirates was the country with the highest YouTube penetration worldwide, as around 94 percent of the country's digital population engaged with the service. In 2024, YouTube counted around 100 million paid subscribers for its YouTube Music and YouTube Premium services. YouTube mobile markets In 2024, YouTube was among the most popular social media platforms worldwide. In terms of revenues, the YouTube app generated approximately 28 million U.S. dollars in revenues in the United States in January 2024, as well as 19 million U.S. dollars in Japan.
This dataset encompasses mobile app based media consumption, collected from over 150,000 first-party US Daily Active Users on Android devices. Use it for measurement, journey understanding or to trigger surveys about sentiment. Platforms covered include Netflix, YouTube, Disney+ and Amazon Prime Video.
Fields include pre-roll ads played, viewing duration, channel, category and more. All data tied to demographics, all consumers can be surveyed about viewership (or other topics), and consumer journey understanding can be gleaned combining this dataset with other MFour OmniTraffic® products.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Jonathan A. [source]
This dataset provides valuable insights into crisis actor videos and their corresponding recommendations on YouTube. It consists of a total of 8823 videos, accounting for an astounding 3,956,454,363 views. These videos were retrieved from YouTube's API and cover various categories and topics.
Specifically, this dataset focuses on crisis actor videos related to mass shootings, false flags, and other conspiracy theories that comprise around 20% of the collection. The remaining 80% explores conspiracies revolving around history, government institutions, and religions.
The dataset includes essential information such as the name and channel of the video uploader. Additionally, it provides details about viewer engagement through likes and dislikes counts. Furthermore, each video is assigned a category or topic to facilitate analysis.
It is important to note that approximately 100 music videos were excluded from the initial data set to maintain relevance to crisis actors.
Overall, this project aims to shed light on the prevalent issue of crisis actors on YouTube by providing researchers with a comprehensive dataset for further exploration and analysis. This highly informative dataset serves as a valuable resource for investigating trends within crisis actor content while contributing towards raising public awareness surrounding this topic
- Understanding the Dataset:
The dataset comprises several columns that provide specific information about each video and its corresponding recommendations. Here's a brief overview of the key columns:
- name: The title or name of the YouTube video.
- channel: The name of the YouTube channel that uploaded the video.
- category: The category or topic of the video.
- views: The number of views the video has received.
- likes: The number of likes received by each video.
dislikes: The number of dislikes received by each video.
Exploring Categories:
One way to analyze this dataset is by examining different categories mentioned in each video entry. This could involve identifying patterns within categories or comparing engagement metrics (views, likes, dislikes) across various topics.
For example, you might want to investigate how crisis actor videos are categorized compared to other conspiracy-related videos present in this dataset.
- Analyzing Engagement Metrics:
To gain insights into users' response towards different videos related to crisis actors or conspiracy theories, it is recommended that you examine engagement metrics such as views, likes, and dislikes.
You can compare these metrics between individual videos within specific categories or observe trends across all entries.
- Investigating Popularity:
Understanding which channels have maximum viewership within this particular subject area can offer valuable information for further analysis.
Examining which channels have consistently high views or engagement metrics (likes/dislikes) can help identify influential content creators related to crisis actors or conspiracy theories.
- Identifying Recommendations:
The dataset also provides information about the recommendations associated with each video entry. By analyzing these recommendations, you can gain insights into the video content YouTube suggests to users who view crisis actor videos.
You could focus on specific keywords within recommendation titles or explore patterns in terms of topic relevance or common recommendations across multiple entries.
- Cross-Referencing External Information:
As this dataset does not provide detailed descriptions or context for each video, it is advisable to cross-reference external sources to gather additional information if needed.
By using the provided video titles and channel names, you can search for more details about specific videos
- Analyzing the correlation between likes, dislikes, and views: This dataset can be used to analyze the relationship between the number of likes and dislikes a video receives and its overall views. By examining this relationship, one could gain insights into factors that contribute to increased engagement or disinterest in crisis actor videos.
- Identifying popular YouTube channels in the crisis actor category: By analyzing the dataset, one can identify which YouTube channels have uploaded the most crisis actor videos and have gained high viewership. Th...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
News dissemination plays a vital role in supporting people to incorporate beneficial actions during public health emergencies, thereby significantly reducing the adverse influences of events. Based on big data from YouTube, this research study takes the declaration of COVID-19 National Public Health Emergency (PHE) as the event impact and employs a DiD model to investigate the effect of PHE on the news dissemination strength of relevant videos. The study findings indicate that the views, comments, and likes on relevant videos significantly increased during the COVID-19 public health emergency. Moreover, the public’s response to PHE has been rapid, with the highest growth in comments and views on videos observed within the first week of the public health emergency, followed by a gradual decline and returning to normal levels within four weeks. In addition, during the COVID-19 public health emergency, in the context of different types of media, lifestyle bloggers, local media, and institutional media demonstrated higher growth in the news dissemination strength of relevant videos as compared to news & political bloggers, foreign media, and personal media, respectively. Further, the audience attracted by related news tends to display a certain level of stickiness, therefore this audience may subscribe to these channels during public health emergencies, which confirms the incentive mechanisms of social media platforms to foster relevant news dissemination during public health emergencies. The proposed findings provide essential insights into effective news dissemination in potential future public health events.
The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach ****** million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global TV analytics market size is USD 3815.2 million in 2024 and will expand at a compound annual growth rate (CAGR) of 18.20% from 2024 to 2031.
North America held the major market of more than 40% of the global revenue with a market size of USD 1526.08 million in 2024 and will grow at a compound annual growth rate (CAGR) of 16.4% from 2024 to 2031.
Europe accounted for a share of over 30% of the global market size of USD 1144.56 million.
Asia Pacific held the market of around 23% of the global revenue with a market size of USD 877.50 million in 2024 and will grow at a compound annual growth rate (CAGR) of 20.2% from 2024 to 2031.
Latin America's market will have more than 5% of the global revenue with a market size of USD 190.76 million in 2024 and will grow at a compound annual growth rate (CAGR) of 17.6% from 2024 to 2031.
Middle East and Africa held the major market of around 2% of the global revenue with a market size of USD 76.30 million in 2024 and will grow at a compound annual growth rate (CAGR) of 17.9% from 2024 to 2031.
The on-premise segment is set to rise as on-premise solutions for OTT platforms are reasonably cost-effective regarding equipment composition and cabling infrastructure. Additionally, under this model, viewers are authorized to determine the type of content, which results in more control.
The TV analytics market is driven by the growing consumer need for digital original series, and the growing trend of subscription-on-video demand (SVoD) platforms has further fuelled industry expansion. Significant demand for numerous genres and plays available on over-the-top (OTT) platforms such as Netflix and Amazon are contributing toward market development.
Integration of Advanced Technologies to Provide Viable Market Output
The TV analytics market is rapidly evolving with the integration of advanced technologies. Innovations such as AI-driven content recognition, real-time data processing, and machine learning algorithms transform how broadcasters and advertisers analyze audience behavior and content performance. These technologies enable precise targeting, personalized recommendations, and insightful audience insights, revolutionizing advertising strategies and content creation. As the industry embraces these advancements, it fosters more efficient decision-making processes and enhances the overall viewer experience, driving the evolution of television analytics.
For instance, in July 2022, MiQ launched its groundbreaking analytics and measurement capacity for cross-channel YouTube and TV campaigns in the UK. The creative solution bridges the intermission between the two channels. By connecting these often-disparate datasets, brands can reach almost 100% of their target viewers on YouTube and calculate reach deterministically across these channels.
Increasing Digitalization and Shifting Viewer Preference to Propel Market Growth
The TV analytics market is experiencing significant growth due to increasing digitalization and shifting viewer preferences. As more viewers consume content across various digital platforms, there's a heightened need for data-driven insights into audience behavior and content performance. With the expansion of streaming assistance and on-demand viewing, traditional TV networks and advertisers are investing in analytics tools to understand viewer engagement, demographics, and content consumption patterns. This trend underscores the critical role of analytics in optimizing content strategies and advertising campaigns amidst evolving viewer dynamics.
For instance, in December 2022, TV analytics firm TVSquared launched its cross-platform measurement and attribution platform for all types of TV, ADvantage XP, in the UK and Germany. The scalable solution brings continuous and impression-based measurement of ad exposure and outcomes to TV campaigns across linear, streaming, and addressable TV.
Complexity of Measuring Viewership across Multiple Platforms to Restrict Market Growth
The TV analytics market faces challenges in measuring viewership across multiple platforms due to the proliferation of streaming services, DVR, an...
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
All the images of faces here are generated using https://thispersondoesnotexist.com/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4c3d3569f4f9c12fc898d76390f68dab%2FBeFunky-collage.jpg?generation=1662079836729388&alt=media" alt="">
Under US copyright law, these images are technically not subject to copyright protection. Only "original works of authorship" are considered. "To qualify as a work of 'authorship' a work must be created by a human being," according to a US Copyright Office's report [PDF].
https://www.theregister.com/2022/08/14/ai_digital_artwork_copyright/
I manually tagged all images as best as I could and separated them between the two classes below
Some may pass either female or male, but I will leave it to you to do the reviewing. I included toddlers and babies under Male/ Female
Each of the faces are totally fake, created using an algorithm called Generative Adversarial Networks (GANs).
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss).
Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning,and reinforcement learning.
Just a simple Jupyter notebook that looped and invoked the website https://thispersondoesnotexist.com/ , saving all images locally
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Animal-related content on social media is hugely popular but is not always appropriate in terms of how animals are portrayed or how they are treated. This has potential implications beyond the individual animals involved, for viewers, for wild animal populations, and for societies and their interactions with animals. Whilst social media platforms usually publish guidelines for permitted content, enforcement relies at least in part on viewers reporting inappropriate posts. Currently, there is no external regulation of social media platforms. Based on a set of 241 "fake animal rescue" videos that exhibited clear signs of animal cruelty and strong evidence of being deliberately staged (i.e. fake), we found little evidence that viewers disliked the videos and an overall mixed response in terms of awareness of the fake nature of the videos, and their attitudes towards the welfare of the animals involved. Our findings suggest, firstly, that despite the narrowly defined nature of the videos used in this case study, exposure rates can be extremely high (one of the videos had been viewed over 100 million times), and, secondly, that many YouTube viewers cannot identify (or are not concerned by) animal welfare or conservation issues within a social media context. In terms of the current policy approach of social media platforms, our findings raise questions regarding the value of their current reliance on consumers as watch dogs.
Methods
Data collection
The dataset pertains to 241YouTube videos identified using the search function in YouTube and the search terms "primitive man saves" and "primitive boy saves" between May and July 2021; supplemented with additional similar videos held in a database collated by Animals for Asia (www.asiaforanimals.com). Video metrics were extracted automatically between 24.06.21 and 02.08.21 using the "tuber" package in R (Sood 2020, https://cran.r-project.org/web/packages/tuber/tuber.pdf ). Additional information (e.g. on animal taxa) was obtained manually by screening the videos. For five of the videos that received > 1,000 comments, comment text was also extracted using the tuber package. Only publicly available videos were accessed.
Data processing
Users (video posters and commenters) have been de-identified. For each video for which comment text was analysed, the text was converted into a list of the most frequently used words and emojis. Please refer to the manuscript for further details on the methods and approach used to identify and define the most frequently used words/emojis, and to assign sentiment scores.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Update — December 7, 2014. – Evidence-based medicine (EBM) is not working for many reasons, for example: 1. Incorrect in their foundations (paradox): hierarchical levels of evidence are supported by opinions (i.e., lowest strength of evidence according to EBM) instead of real data collected from different types of study designs (i.e., evidence). http://dx.doi.org/10.6084/m9.figshare.1122534 2. The effect of criminal practices by pharmaceutical companies is only possible because of the complicity of others: healthcare systems, professional associations, governmental and academic institutions. Pharmaceutical companies also corrupt at the personal level, politicians and political parties are on their payroll, medical professionals seduced by different types of gifts in exchange of prescriptions (i.e., bribery) which very likely results in patients not receiving the proper treatment for their disease, many times there is no such thing: healthy persons not needing pharmacological treatments of any kind are constantly misdiagnosed and treated with unnecessary drugs. Some medical professionals are converted in K.O.L. which is only a puppet appearing on stage to spread lies to their peers, a person supposedly trained to improve the well-being of others, now deceits on behalf of pharmaceutical companies. Probably the saddest thing is that many honest doctors are being misled by these lies created by the rules of pharmaceutical marketing instead of scientific, medical, and ethical principles. Interpretation of EBM in this context was not anticipated by their creators. “The main reason we take so many drugs is that drug companies don’t sell drugs, they sell lies about drugs.” ―Peter C. Gøtzsche “doctors and their organisations should recognise that it is unethical to receive money that has been earned in part through crimes that have harmed those people whose interests doctors are expected to take care of. Many crimes would be impossible to carry out if doctors weren’t willing to participate in them.” —Peter C Gøtzsche, The BMJ, 2012, Big pharma often commits corporate crime, and this must be stopped. Pending (Colombia): Health Promoter Entities (In Spanish: EPS ―Empresas Promotoras de Salud).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
While focusing on "made for kids" channels is a useful starting point for analysing ad patterns on kids' videos, it is also important to consider the wider landscape of child-oriented content on the platform, much of which remains unlabelled. To build a representative dataset of such videos, we use seed search words reflecting popular child interests, some of which include "toys", "kids cartoon", and "Barbie." The results are then parsed to find popular channels with unlabelled content, with a minimum threshold of 400,000 views.
Next, we scrape ad data across all videos for further analysis, covering all major ad formats on the platform including (i) skippable and (ii) unskippable video ads, (iii) sidebar ads, (iv) in-feed ads, and (v) banner ads. We use a Selenium Webdriver script launched in a new logged-out Chrome window, with no previous history, cookies, or user data. We then scrape each ad’s unique YouTube-assigned video ID, and any embedded external link as the video plays.
Next, we use YouTube Data API to obtain additional metadata like video title, duration, and "made for kids" label for each video ad, the result of which is recorded in the dataset. The videos are played from different VPN locations to explore the varied experiences based on geographical location.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.
The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.
Citing the RAVDESS
The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.
Academic paper citation
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
Personal use citation
Include a link to this Zenodo page - https://zenodo.org/record/1188976
Commercial Licenses
Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.
Contact Information
If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.
Example Videos
Watch a sample of the RAVDESS speech and song videos.
Emotion Classification Users
If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].
Construction and Validation
Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.
The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.
Contents
Audio-only files
Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):
Audio-Visual and Video-only files
Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:
File Summary
In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).
File naming convention
Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:
Filename identifiers
Filename example: 02-01-06-01-02-01-12.mp4
License information
The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0
Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.
Related Data sets
By VISHWANATH SESHAGIRI [source]
This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.
To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released
Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.
Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.
Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types
If you use this dataset in your research, please credit the original authors.
License
Unknown License - Please check the dataset description for more information.
File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...