30 datasets found

Z
Spotify and Youtube
data.niaid.nih.gov
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guarisco, Marco (2023). Spotify and Youtube [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10253414
Explore at:
Dataset updated
Dec 4, 2023
Dataset provided by
Guarisco, Marco
Rastelli, Salvatore
Sallustio, Marco
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
YouTube
Description
This is the statistics for the Top 10 songs of various spotify artists and their YouTube videos. The Creators above generated the data and uploaded it to Kaggle on February 6-7 2023. The license to use this data is "CC0: Public Domain", allowing the data to be copied, modified, distributed, and worked on without having to ask permission. The data is in numerical and textual CSV format as attached. This dataset contains the statistics and attributes of the top 10 songs of various artists in the world. As described by the creators above, it includes 26 variables for each of the songs collected from spotify. These variables are briefly described next:

Track: name of the song, as visible on the Spotify platform. Artist: name of the artist. Url_spotify: the Url of the artist. Album: the album in wich the song is contained on Spotify. Album_type: indicates if the song is relesead on Spotify as a single or contained in an album. Uri: a spotify link used to find the song through the API. Danceability: describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. Energy: is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. Key: the key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. Loudness: the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. Speechiness: detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. Acousticness: a confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. Instrumentalness: predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. Liveness: detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. Valence: a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). Tempo: the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. Duration_ms: the duration of the track in milliseconds. Stream: number of streams of the song on Spotify. Url_youtube: url of the video linked to the song on Youtube, if it have any. Title: title of the videoclip on youtube. Channel: name of the channel that have published the video. Views: number of views. Likes: number of likes. Comments: number of comments. Description: description of the video on Youtube. Licensed: Indicates whether the video represents licensed content, which means that the content was uploaded to a channel linked to a YouTube content partner and then claimed by that partner. official_video: boolean value that indicates if the video found is the official video of the song. The data was last updated on February 7, 2023.
m
Youtube Advertising Value
data.mendeley.com
Updated Jul 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thi-Phuong-Linh Nguyen (2021). Youtube Advertising Value [Dataset]. http://doi.org/10.17632/fsmzz6dvmb.1
Explore at:
Unique identifier
https://doi.org/10.17632/fsmzz6dvmb.1
Dataset updated
Jul 9, 2021
Authors
Thi-Phuong-Linh Nguyen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
A dataset of Youtube advertising value and purchase intentions of young Vietnamese customers
l
YouTube RPM by Niche (2025)
learningrevolution.net
html
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jawad Khan (2025). YouTube RPM by Niche (2025) [Dataset]. https://www.learningrevolution.net/how-much-money-does-youtube-pay-for-1-million-views/
Explore at:
htmlAvailable download formats
Dataset updated
Jun 23, 2025
Dataset provided by
Learning Revolution
Authors
Jawad Khan
Area covered
YouTube
Variables measured
Gaming, Travel, Finance, Education, Technology, Memes/Vlogs
Description
This dataset provides estimated YouTube RPM (Revenue Per Mille) ranges for different niches in 2025, based on ad revenue earned per 1,000 monetized views.
Social media revenue of selected companies 2023
statista.com
es.statista.com
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Social media revenue of selected companies 2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
In 2023, Meta Platforms had a total annual revenue of over 134 billion U.S. dollars, up from 116 billion in 2022. LinkedIn reported its highest annual revenue to date, generating over 15 billion USD, whilst Snapchat reported an annual revenue of 4.6 billion USD.
Z
Quilt-1M: One Million Image-Text Pairs for Histopathology
data.niaid.nih.gov
zenodo.org
Updated Aug 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linda G. Shapiro (2023). Quilt-1M: One Million Image-Text Pairs for Histopathology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8239941
Explore at:
Dataset updated
Aug 16, 2023
Dataset provided by
Mehmet S. Seyfioglu
Wisdom Oluchi Ikezogwo
Fatemeh Ghezloo
Ranjay Krishna
Fatwir S. Mohammed
Linda G. Shapiro
Dylan Geva
Pavan K. Anand
Description
Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of similar data in the medical field, specifically in histopathology, has slowed similar progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 802,148 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models), handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets, from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new pathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.

Instagram accounts with the most followers worldwide 2024

statista.com
es.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

              The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.

              How popular is Instagram?

              Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.

              Who uses Instagram?

              Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.

              Celebrity influencers on Instagram
              Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.

Materials in Vessels Dataset, Annotated images of materials in transparent...
zenodo.org
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagi Eppel; Sagi Eppel (2021). Materials in Vessels Dataset, Annotated images of materials in transparent vessels for semantic segmentation [Dataset]. http://doi.org/10.5281/zenodo.5769354
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5769354
Dataset updated
Dec 9, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sagi Eppel; Sagi Eppel
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data set of materials in vessels
The handling of materials in glassware vessels is the main task in chemistry laboratory research as well as a large number of other activities. Visual recognition of the physical phase of the
materials is essential for many methods ranging from a simple task such as fill-level evaluation to the
identification of more complex properties such as solvation, precipitation, crystallization and phase
separation. To help train neural nets for this task, a new data set was created. The data set contains a
thousand images of materials, in different phases and involved in different chemical processes, in a
laboratory setting. Each pixel in each image is labeled according to several layers of classification, as
given below:

a. Vessel/Background: For each pixel assign value of one if it is part of the vessel and zero otherwise.
This annotation was used as the ROI map for the valve filter method.

b. Filled/Empty: This is similar to the above, but also distinguishes between the filled and empty
regions of the vessel. For each pixel, one of the following three values is assigned:0 (background); 1
(empty vessel); or 2 (filled vessel).

c. Phase type: This is similar to the above but distinguishes between liquid and solid regions of the
filled vessel. For each pixel, one of the following four values: 0 (background); 1 (empty vessel); 2
(liquid); or 3 (solid).

d. Fine-grained physical phase type: This is similar to the above but distinguishes between specific
classes of physical phase. For each pixel, one of 15 values is assigned: 1 (background); 2 (empty
vessel); 3 (liquid); 4 (liquid phase two, in the case where more than one phase of the liquid appears in
the vessel); 5 (suspension); 6 (emulsion); 7 (foam); 8 (solid); 9 (gel); 10 (powder); 11 (granular); 12
(bulk); 13 (solid-liquid mixture); 14 (solid phase two, in the case where more than one phase of solid
exists in the vessel): and 15 (vapor).
The annotations are given as images of the size of the original image, where the pixel value is the
class number. The annotation of the vessel region (a) is used in the ROI input for the valve filter net .

4.1. Validation/testing set
The data set is divided into training and testing sets. The testing set is itself divided into two subsets;
one contains images extracted from the same YouTube channels as the training set, and therefore was
taken under similar conditions as the training images. The second subset contains images extracted
from YouTube channels not included in the training set, and hence contains images taken under
different conditions from those used to train the net.

4.2. Creating the data set
The creation of a large number of images with a variety of chemical processes and settings could have
been a daunting task. Luckily, several YouTube channels dedicated to chemical experiments exist
which offer high-quality footage of chemistry experiments. Thanks to these channels, including
NurdRage, NileRed, ChemPlayer, it was possible to collect a large number of high-quality images in a
short time. Pixel-wise annotation of these images was another challenging task, and was performed by
Alexandra Emanuel and Mor Bismuth.

For more details see: Setting attention region for convolutional neural networks using region selective features, for recognition of materials within glass vessels

This dataset was first published in 2017.8

For newer and Bigger datasets see

https://zenodo.org/record/4736111#.YbG-RrtyZH4

https://zenodo.org/record/3697452#.YbG-TLtyZH4
I
The Visual-Inertial Canoe Dataset
databank.illinois.edu
aws-databank-alb.library.illinois.edu
Updated Nov 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Miller; Soon-Jo Chung; Seth Hutchinson (2017). The Visual-Inertial Canoe Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9342111_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9342111_V1
Dataset updated
Nov 14, 2017
Authors
Martin Miller; Soon-Jo Chung; Seth Hutchinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
Office of Naval Research
Description
If you use this dataset, please cite the IJRR data paper (bibtex is below). We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an IMU, and a GPS device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described in this paper. The data is divided into subsets, which can be downloaded individually. Video previews are available on Youtube: https://www.youtube.com/channel/UCOU9e7xxqmL_s4QX6jsGZSw The information below can also be found in the README files provided in the 527 dataset and each of its subsets. The purpose of this document is to assist researchers in using this dataset. Images ====== Raw --- The raw images are stored in the cam0 and cam1 directories in bmp format. They are bayered images that need to be debayered and undistorted before they are used. The camera parameters for these images can be found in camchain-imucam.yaml. Note that the camera intrinsics describe a 1600x1200 resolution image, so the focal length and center pixel coordinates must be scaled by 0.5 before they are used. The distortion coefficients remain the same even for the scaled images. The camera to imu tranformation matrix is also in this file. cam0/ refers to the left camera, and cam1/ refers to the right camera. Rectified --------- Stereo rectified, undistorted, row-aligned, debayered images are stored in the rectified/ directory in the same way as the raw images except that they are in png format. The params.yaml file contains the projection and rotation matrices necessary to use these images. The resolution of these parameters do not need to be scaled as is necessary for the raw images. params.yml ---------- The stereo rectification parameters. R0,R1,P0,P1, and Q correspond to the outputs of the OpenCV stereoRectify function except that 1s and 2s are replaced by 0s and 1s, respectively. R0: The rectifying rotation matrix of the left camera. R1: The rectifying rotation matrix of the right camera. P0: The projection matrix of the left camera. P1: The projection matrix of the right camera. Q: Disparity to depth mapping matrix T_cam_imu: Transformation matrix for a point in the IMU frame to the left camera frame. camchain-imucam.yaml -------------------- The camera intrinsic and extrinsic parameters and the camera to IMU transformation usable with the raw images. T_cam_imu: Transformation matrix for a point in the IMU frame to the camera frame. distortion_coeffs: lens distortion coefficients using the radial tangential model. intrinsics: focal length x, focal length y, principal point x, principal point y resolution: resolution of calibration. Scale the intrinsics for use with the raw 800x600 images. The distortion coefficients do not change when the image is scaled. T_cn_cnm1: Transformation matrix from the right camera to the left camera. Sensors ------- Here, each message in name.csv is described ###rawimus### time # GPS time in seconds message name # rawimus acceleration_z # m/s^2 IMU uses right-forward-up coordinates -acceleration_y # m/s^2 acceleration_x # m/s^2 angular_rate_z # rad/s IMU uses right-forward-up coordinates -angular_rate_y # rad/s angular_rate_x # rad/s ###IMG### time # GPS time in seconds message name # IMG left image filename right image filename ###inspvas### time # GPS time in seconds message name # inspvas latitude longitude altitude # ellipsoidal height WGS84 in meters north velocity # m/s east velocity # m/s up velocity # m/s roll # right hand rotation about y axis in degrees pitch # right hand rotation about x axis in degrees azimuth # left hand rotation about z axis in degrees clockwise from north ###inscovs### time # GPS time in seconds message name # inscovs position covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz m^2 attitude covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz deg^2 velocity covariance # 9 values xx,xy,xz,yx,yy,yz,zx,zy,zz (m/s)^2 ###bestutm### time # GPS time in seconds message name # bestutm utm zone # numerical zone utm character # alphabetical zone northing # m easting # m height # m above mean sea level Camera logs ----------- The files name.cam0 and name.cam1 are text files that correspond to cameras 0 and 1, respectively. The columns are defined by: unused: The first column is all 1s and can be ignored. software frame number: This number increments at the end of every iteration of the software loop. camera frame number: This number is generated by the camera and increments each time the shutter is triggered. The software and camera frame numbers do not have to start at the same value, but if the difference between the initial and final values is not the same, it suggests that frames may have been dropped. camera timestamp: This is the cameras internal timestamp of the frame capture in units of 100 milliseconds. PC timestamp: This is the PC time of arrival of the image. name.kml -------- The kml file is a mapping file that can be read by software such as Google Earth. It contains the recorded GPS trajectory. name.unicsv ----------- This is a csv file of the GPS trajectory in UTM coordinates that can be read by gpsbabel, software for manipulating GPS paths. @article{doi:10.1177/0278364917751842, author = {Martin Miller and Soon-Jo Chung and Seth Hutchinson}, title ={The Visual–Inertial Canoe Dataset}, journal = {The International Journal of Robotics Research}, volume = {37}, number = {1}, pages = {13-20}, year = {2018}, doi = {10.1177/0278364917751842}, URL = {https://doi.org/10.1177/0278364917751842}, eprint = {https://doi.org/10.1177/0278364917751842} }
QGIS Training Tutorials: Using Spatial Data in Geographic Information...
open.canada.ca
datasets.ai
+2more
html
Updated Oct 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2021). QGIS Training Tutorials: Using Spatial Data in Geographic Information Systems [Dataset]. https://open.canada.ca/data/en/dataset/89be0c73-6f1f-40b7-b034-323cb40b8eff
Explore at:
htmlAvailable download formats
Dataset updated
Oct 5, 2021
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Have you ever wanted to create your own maps, or integrate and visualize spatial datasets to examine changes in trends between locations and over time? Follow along with these training tutorials on QGIS, an open source geographic information system (GIS) and learn key concepts, procedures and skills for performing common GIS tasks – such as creating maps, as well as joining, overlaying and visualizing spatial datasets. These tutorials are geared towards new GIS users. We’ll start with foundational concepts, and build towards more advanced topics throughout – demonstrating how with a few relatively easy steps you can get quite a lot out of GIS. You can then extend these skills to datasets of thematic relevance to you in addressing tasks faced in your day-to-day work.
Sabrina Carpenter Discography
kaggle.com
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Delfina Oliva (2024). Sabrina Carpenter Discography [Dataset]. https://www.kaggle.com/datasets/delfinaoliva/sabrina-carpenter-discography/suggestions?status=pending
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Delfina Oliva
Description
Want to know more about the new phenomenon del pop? This dataset brinds you a lot of information about Sabrina Carpenter, and how her streams and sales increased over the years.

The columns in this dataset are:

ID - unique identifier for each track

track_name - the name of the track

track_type - the type of track, which could indicate if it’s a single, remix, or b-side

track_musical_genre - the genre classification of the track

duration_ms - the duration of the track in milliseconds

spotify_streams - the number of streams the track has on Spotify

spotify_global_peak - the highest global ranking the track achieved on Spotify charts

spotify_usa_peak - the highest USA ranking the track achieved on Spotify charts

track_videoclip - indicates if the track has an official videoclip

videoclip_views - the number of views of the track's videoclip on YouTube

album - the name of the album

album_physical_sales - the physical sales of the album

track_number - the order the song appears on his album

album_musical_genre - the genre of the album

release_date - the release date of the track

uri - the Spotify URI for the track

acousticness - a measure of how acoustic the track is (0 to 1)

danceability - a measure of how suitable the track is for dancing (0 to 1) based on a combination of musical elements including rhythm stability, tempo and beat

energy - the intensity and activity level of the track (0 to 1). Typically, energetic tracks feel fast, loud, and noisy

instrumentalness - a measure of how much of the track is instrumental (0 to 1). The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content

liveness - a measure of how "live" the track feels, indicating audience presence (0 to 1). Tracks with higher liveness values are more likely to have been performed in a live setting

loudness - the overall loudness of the track in decibels (dB). Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db

speechiness - a measure of the amount of spoken words in the track (0 to 1). The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value

tempo - the tempo or speed of the track in beats per minute (BPM). Tempo is the speed or pace of a given piece and derives directly from the average beat duration

valence - a measure of how positive or negative the track sounds (0 to 1). Tracks with high valence sound more positive
Instagram: most used hashtags 2024
statista.com
es.statista.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department, Instagram: most used hashtags 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
As of January 2024, #love was the most used hashtag on Instagram, being included in over two billion posts on the social media platform. #Instagood and #instagram were used over one billion times as of early 2024.
Cross-Correlation Function (CCF) between Q and R time-series with respect to...
plos.figshare.com
figshare.com
xls
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emanuele Brugnoli; Marco Delmastro (2025). Cross-Correlation Function (CCF) between Q and R time-series with respect to the three periods analyzed. [Dataset]. http://doi.org/10.1371/journal.pone.0316258.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316258.t001
Dataset updated
Jan 15, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Emanuele Brugnoli; Marco Delmastro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cross-Correlation Function (CCF) between Q and R time-series with respect to the three periods analyzed.
Z
Data from: A comprehensive dataset of the Spanish research output and its...
data.niaid.nih.gov
zenodo.org
Updated Dec 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arroyo-Machado, Wenceslao (2022). A comprehensive dataset of the Spanish research output and its associated social media and altmetric mentions (2016-2020) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6184380
Explore at:
Dataset updated
Dec 2, 2022
Dataset provided by
Arroyo-Machado, Wenceslao
Torres-Salinas, Daniel
Robinson-Garcia, Nicolas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data on research publications authored by Spanish institutions between 2016 and 2020 with their associated social media and altmetric mentions, and on researchers affiliated to Spanish institutions whose work is highly mentioned in social media and non-academic outlets.

Variables of the publications dataset:

id - Unique publication identifier

title - Full title of the publication

year - Year of publication

type - Document type

journal - Name of the journal

esi - ESI category of the publication

influratio - AAS value on March 3, 2021

news - Number of mentions in news media

blogs - Number of mentions in blogs

policy - Number of mentions in policy reports

patent - Number of mentions in patent

twitter - Number of mentions in Twitter

post_peer - Number of mentions in PubPeer and Publons

weibo - Number of mentions in Weibo

facebook - Number of mentions in Facebook

wikipedia - Number of mentions in Wikipedia

google - Number of mentions in Google+

linkedin - Number of mentions in LinkedIn

reddit - Number of mentions in Reddit

pinterest - Number of mentions in Pinterest

f1000 - Number of mentions in F1000

stack_overflow - Number of mentions in Stack Overflow

youtube - Number of mentions in YouTube

syllabus - Number of mentions in Open Syllabus Project

Variables of the top authors dataset:

name - Full name of the researcher

orcid - ORCID record

organization - Name of the institution of affiliation

publications - List of publication identifiers (id) connecting with the publications dataset
Social Power NBA
kaggle.com
Updated Aug 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Gift (2017). Social Power NBA [Dataset]. https://www.kaggle.com/datasets/noahgift/social-power-nba/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2017
Dataset provided by
Kaggle
Authors
Noah Gift
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

This data set contains combined on-court performance data for NBA players in the 2016-2017 season, alongside salary, Twitter engagement, and Wikipedia traffic data.

Further information can be found in a series of articles for IBM Developerworks: "Explore valuation and attendance using data science and machine learning" and "Exploring the individual NBA players".

A talk about this dataset has slides from March, 2018, Strata:

https://www.slideshare.net/noahgift/social-power-andinfluenceinthenba-89807740?qid=3f9f835a-f3d7-4174-8a8c-c97f9c82e614&v=&b=&from_search=1

Further reading on this dataset is in the book Pragmatic AI, in Chapter 6 or full book, Pragmatic AI: An introduction to Cloud-based Machine Learning and watch lesson 9 in Essential Machine Learning and AI with Python and Jupyter Notebook

Followup Items

You can watch a breakdown of using cluster analysis on the Pragmatic AI YouTube channel

Learn to deploy a Kaggle project into a production Machine Learning sklearn + flask + container by reading Python for Devops: Learn Ruthlessly Effective Automation, Chapter 14: MLOps and Machine learning engineering

Use social media to predict a winning season with this notebook: https://github.com/noahgift/core-stats-datascience/blob/master/Lesson2_7_Trends_Supervized_Learning.ipynb

Learn to use the cloud for data analysis.

Acknowledgement

Data sources include ESPN, Basketball-Reference, Twitter, Five-ThirtyEight, and Wikipedia. The source code for this dataset (in Python and R) can be found on GitHub. Links to more writing can be found at noahgift.com.

Inspiration

Do NBA fans know more about who the best players are, or do owners?

What is the true worth of the social media presence of athletes in the NBA?
Advanced: Saudi Arabian Aramco Stocks Dataset 🐪
kaggle.com
Updated May 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azhar Saleem (2024). Advanced: Saudi Arabian Aramco Stocks Dataset 🐪 [Dataset]. https://www.kaggle.com/datasets/azharsaleem/advanced-saudi-arabian-aramco-stocks-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Azhar Saleem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Saudi Arabia
Description
Saudi Arabian Oil Company Aramco, Stocks

👨‍💻 Author: Azhar Saleem

"https://github.com/azharsaleem18" target="_blank"> https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github" alt="GitHub Profile"> "https://www.kaggle.com/azharsaleem" target="_blank"> https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle" alt="Kaggle Profile"> "https://www.linkedin.com/in/azhar-saleem/" target="_blank"> https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin" alt="LinkedIn Profile">
"https://www.youtube.com/@AzharSaleem19" target="_blank"> https://img.shields.io/badge/YouTube-Profile-red?style=for-the-badge&logo=youtube" alt="YouTube Profile"> "https://www.facebook.com/azhar.saleem1472/" target="_blank"> https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook" alt="Facebook Profile"> "https://www.tiktok.com/@azhar_saleem18" target="_blank"> https://img.shields.io/badge/TikTok-Profile-blue?style=for-the-badge&logo=tiktok" alt="TikTok Profile">
"https://twitter.com/azhar_saleem18" target="_blank"> https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter" alt="Twitter Profile"> "https://www.instagram.com/azhar_saleem18/" target="_blank"> https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram" alt="Instagram Profile"> "mailto:azharsaleem6@gmail.com"> https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=gmail" alt="Email Contact">

Dataset Description

Welcome to the Enhanced Saudi Arabian Oil Company (Aramco) Stock Dataset! This dataset has been meticulously prepared from Yahoo Finance and further enriched with several engineered features to elevate your data analysis, machine learning, and financial forecasting projects. It captures the daily trading figures of Aramco stocks, presented in Saudi Riyal (SAR), providing a robust foundation for comprehensive market analysis.

Columns in the Dataset

Date: The trading day for the data recorded (ISO 8601 format).

Open: The price at which the stock first traded upon the opening of an exchange on a given trading day.

High: The highest price at which the stock traded during the trading day.

Low: The lowest price at which the stock traded during the trading day.

Close: The price at which the stock last traded upon the close of an exchange on a given trading day.

Volume: The total number of shares traded during the trading day.

Dividends: The dividend value paid out per share on the trading day.

Stock Splits: The number of stock splits occurring on the trading day.

Lag Features (Lag_Close, Lag_High, Lag_Low): Previous day's closing, highest, and lowest prices.

Rolling Window Statistics (e.g., Rolling_Mean_7, Rolling_Std_7): 7-day and 30-day moving averages and standard deviations of the Close price.

Technical Indicators (RSI, MACD, Bollinger Bands): Key metrics used in trading to analyze short-term price movements.

Change Features (Change_Close, Change_Volume): Day-over-day changes in Close price and trading volume.

Date-Time Features (Weekday, Month, Year, Quarter): Extracted components of the trading day.

Volume_Normalized: The standardized trading volume using z-score normalization to adjust for scale differences.

Potential Uses

This dataset is tailored for a wide array of applications:

Financial Analysis: Explore historical performance, volatility, and market trends.

Forecasting Models: Utilize features like lagged prices and rolling statistics to predict future stock prices.

Machine Learning: Develop regression models or classification frameworks to predict market movements.

Deep Learning: Leverage LSTM networks for more sophisticated time-series forecasting.

Time-Series Analysis: Dive deep into trend analysis, seasonality, and cyclical behavior of stock prices.

Whether you are a data scientist, a financial analyst, or a hobbyist interested in the stock market, this dataset provides a rich playground for analysis and model building. Its comprehensive feature set allows for the development of robust predictive models and offers unique insights into one of the world’s most significant oil companies. Unlock the potential of financial data with this carefully crafted dataset.
l
400+ YouTube Channel Name Ideas by Niche (2025)
learningrevolution.net
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jawad Khan (2025). 400+ YouTube Channel Name Ideas by Niche (2025) [Dataset]. https://www.learningrevolution.net/youtube-channel-name-ideas/
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
Learning Revolution
Authors
Jawad Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
A curated dataset of over 400 unique and creative YouTube channel name ideas organized by popular niches such as gaming, travel, tech, beauty, vlogging, pets, DIY, education, and more. Includes a free YouTube channel name generator to help creators find inspiration for their brand.
IPL Player Auction Dataset - From Start to Now
kaggle.com
Updated Apr 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kalilur Rahman (2025). IPL Player Auction Dataset - From Start to Now [Dataset]. https://www.kaggle.com/datasets/kalilurrahman/ipl-player-auction-dataset-from-start-to-now/versions/39
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kalilur Rahman
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://upload.wikimedia.org/wikipedia/en/thumb/8/84/Indian_Premier_League_Official_Logo.svg/1024px-Indian_Premier_League_Official_Logo.svg.png" alt="">

IPL

The Indian Premier League (IPL) is a professional men's Twenty20 cricket league, contested by ten teams based out of ten Indian cities. The league was founded by the Board of Control for Cricket in India (BCCI) in 2007. It is usually held between March and May of every year and has an exclusive window in the ICC Future Tours Programme.

The IPL is the most-attended cricket league in the world and in 2014 was ranked sixth by average attendance among all sports leagues. In 2010, the IPL became the first sporting event in the world to be broadcast live on YouTube. The brand value of the IPL in 2019 was ₹47,500 crore (US$6.3 billion), according to Duff & Phelps. According to BCCI, the 2015 IPL season contributed ₹1,150 crore (US$150 million) to the GDP of the Indian economy. The 2020 IPL season set a massive viewership record with 31.57 million average impressions and with an overall consumption increase of 23 per cent from the 2019 season.

There have been fourteen seasons of the IPL tournament. The current IPL title holders are the Chennai Super Kings, winning the 2021 season. The venue for the 2020 season was moved due to the COVID-19 pandemic and games were played in the United Arab Emirates.

IPL Auction

A team can acquire players through any of the three ways: the annual player auction, trading players with other teams during the trading windows, and signing replacements for unavailable players. Players sign up for the auction and also set their base price, and are bought by the franchise that bids the highest for them. Unsold players at the auction are eligible to be signed up as replacement signings. In the trading windows, a player can only be traded with his consent, with the franchise paying the difference if any between the old and new contracts. If the new contract is worth more than the older one, the difference is shared between the player and the franchise selling the player. There are generally three trading windows—two before the auction and one after the auction but before the start of the tournament. Players cannot be traded outside the trading windows or during the tournament, whereas replacements can be signed before or during the tournament.

Some of the team composition rules (as of 2020 season) are as follows: - The squad strength must be between 18 and 25 players, with a maximum of 8 overseas players. - Salary cap of the entire squad must not exceed ₹850 million (US$11 million). - Under-19 players cannot be picked unless they have previously played first-class or List A cricket. - A team can play a maximum of 4 overseas players in their playing eleven.
f
Breakdown of the dataset.
plos.figshare.com
xls
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emanuele Brugnoli; Marco Delmastro (2025). Breakdown of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0316258.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316258.t003
Dataset updated
Jan 15, 2025
Dataset provided by
PLOS ONE
Authors
Emanuele Brugnoli; Marco Delmastro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Covid-19 pandemic has sparked renewed attention to the risks of online misinformation, emphasizing its impact on individuals’ quality of life through the spread of health-related myths and misconceptions. In this study, we analyze 6 years (2016–2021) of Italian vaccine debate across diverse social media platforms (Facebook, Instagram, Twitter, YouTube), encompassing all major news sources–both questionable and reliable. We first use the symbolic transfer entropy analysis of news production time-series to dynamically determine which category of sources, questionable or reliable, causally drives the agenda on vaccines. Then, leveraging deep learning models capable to accurately classify vaccine-related content based on the conveyed stance and discussed topic, respectively, we evaluate the focus on various topics by news sources promoting opposing views and compare the resulting user engagement. Our study uncovers misinformation not as a parasite of the news ecosystem that merely opposes the perspectives offered by mainstream media, but as an autonomous force capable of even overwhelming the production of vaccine-related content from the latter. While the pervasiveness of misinformation is evident in the significantly higher engagement of questionable sources compared to reliable ones (up to 11 times higher in median value), our findings underscore the need for consistent and thorough pro-vax coverage to counter this imbalance. This is especially important for sensitive topics, where the risk of misinformation spreading and potentially exacerbating negative attitudes toward vaccines is higher. While reliable sources have successfully promoted vaccine efficacy, reducing anti-vax impact, gaps in pro-vax coverage on vaccine safety led to the highest engagement with anti-vax content.

Instagram: distribution of global audiences 2024, by age group

statista.com
es.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Instagram: distribution of global audiences 2024, by age group [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.

              Instagram users

              With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.

              Instagram features

              One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
              As of the second quarter of 2021, Snapchat had 293 million daily active users.

Data from: Indian Premier League Dataset
kaggle.com
zip
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saad Bin Manjur Adit (2021). Indian Premier League Dataset [Dataset]. https://www.kaggle.com/saadbinmanjuradit/indian-premier-league-dataset
Explore at:
zip(1261343 bytes)Available download formats
Dataset updated
Feb 16, 2021
Authors
Saad Bin Manjur Adit
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Area covered
India
Description
Context

The Indian Premier League (IPL) is a professional Twenty20 cricket league in India usually contested between March and May of every year by eight teams representing eight different cities or states in India. The league was founded by the Board of Control for Cricket in India (BCCI) in 2007. The IPL has an exclusive window in ICC Future Tours Programme.

The IPL is the most-attended cricket league in the world and in 2014 was ranked sixth by average attendance among all sports leagues. In 2010, the IPL became the first sporting event in the world to be broadcast live on YouTube. The brand value of the IPL in 2019 was ₹475 billion (US$6.7 billion), according to Duff & Phelps. According to BCCI, the 2015 IPL season contributed ₹11.5 billion (US$160 million) to the GDP of the Indian economy.

Content

The dataset consist of data about IPL matches played from the year 2008 to 2019. IPL is a professional Twenty20 cricket league founded by the Board of Control for Cricket in India (BCCI) in 2008. The league has 8 teams representing 8 different Indian cities or states. It enjoys tremendous popularity and the brand value of the IPL in 2019 was estimated to be ₹475 billion (US$6.7 billion). So let’s analyze IPL through stats.

The dataset has 18 columns. Let’s get acquainted with the columns. - id: The IPL match id. - season: The IPL season - city: The city where the IPL match was held. - date: The date on which the match was held. - team1: One of the teams of the IPL match - team2: The other team of the IPL match - toss_winner: The team that won the toss - toss_decision: The decision taken by the team that won the toss to ‘bat’ or ‘field’ - result: The result(‘normal’, ‘tie’, ‘no result’) of the match. - dl_applied: (1 or 0)indicates whether the Duckworth-Lewis rule was applied or not. - winner: The winner of the match. - win_by_runs: Provides the runs by which the team batting first won - win_by_runs: Provides the number of wickets by which the team batting second won. - player_of_match: The outstanding player of the match. - venue: The venue where the match was hosted. - umpire1: One of the two on-field umpires who officiate the match. - umpire2: One of the two on-field umpires who officiate the match. - umpire3: The off-field umpire who officiates the match

Acknowledgements

Data source from 2008-2017 - CricSheet.org and Manas - Kaggle

Indian Premier League 2008-2019 Navaneesh Kumar - Kaggle

Data source for 2018-2019 - IPL T20 - Official website

Facebook

Twitter

Click to copy link

Link copied

Cite

Guarisco, Marco (2023). Spotify and Youtube [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10253414

Spotify and Youtube

Explore at:

Dataset updated

Dec 4, 2023

Dataset provided by

Guarisco, Marco
Rastelli, Salvatore
Sallustio, Marco

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

YouTube

Description

This is the statistics for the Top 10 songs of various spotify artists and their YouTube videos. The Creators above generated the data and uploaded it to Kaggle on February 6-7 2023. The license to use this data is "CC0: Public Domain", allowing the data to be copied, modified, distributed, and worked on without having to ask permission. The data is in numerical and textual CSV format as attached. This dataset contains the statistics and attributes of the top 10 songs of various artists in the world. As described by the creators above, it includes 26 variables for each of the songs collected from spotify. These variables are briefly described next:

Track: name of the song, as visible on the Spotify platform. Artist: name of the artist. Url_spotify: the Url of the artist. Album: the album in wich the song is contained on Spotify. Album_type: indicates if the song is relesead on Spotify as a single or contained in an album. Uri: a spotify link used to find the song through the API. Danceability: describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. Energy: is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. Key: the key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. Loudness: the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. Speechiness: detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. Acousticness: a confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. Instrumentalness: predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. Liveness: detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. Valence: a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). Tempo: the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. Duration_ms: the duration of the track in milliseconds. Stream: number of streams of the song on Spotify. Url_youtube: url of the video linked to the song on Youtube, if it have any. Title: title of the videoclip on youtube. Channel: name of the channel that have published the video. Views: number of views. Likes: number of likes. Comments: number of comments. Description: description of the video on Youtube. Licensed: Indicates whether the video represents licensed content, which means that the content was uploaded to a channel linked to a YouTube content partner and then claimed by that partner. official_video: boolean value that indicates if the video found is the official video of the song. The data was last updated on February 7, 2023.

Clear search

Close search

Google apps

Main menu

Spotify and Youtube

Youtube Advertising Value

YouTube RPM by Niche (2025)

Social media revenue of selected companies 2023

Quilt-1M: One Million Image-Text Pairs for Histopathology

Instagram accounts with the most followers worldwide 2024

Materials in Vessels Dataset, Annotated images of materials in transparent...

The Visual-Inertial Canoe Dataset

QGIS Training Tutorials: Using Spatial Data in Geographic Information...

Sabrina Carpenter Discography

Instagram: most used hashtags 2024

Cross-Correlation Function (CCF) between Q and R time-series with respect to...

Data from: A comprehensive dataset of the Spanish research output and its...

Social Power NBA

Context

Followup Items

Acknowledgement

Inspiration

Advanced: Saudi Arabian Aramco Stocks Dataset 🐪

Saudi Arabian Oil Company Aramco, Stocks

👨‍💻 Author: Azhar Saleem

Dataset Description

Columns in the Dataset

Potential Uses

400+ YouTube Channel Name Ideas by Niche (2025)

IPL Player Auction Dataset - From Start to Now

IPL

IPL Auction

Breakdown of the dataset.

Instagram: distribution of global audiences 2024, by age group

Data from: Indian Premier League Dataset

Context

Content

Acknowledgements

Spotify and Youtube