15 datasets found

Top Youtube Artist
kaggle.com
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrityunjay Pathak (2023). Top Youtube Artist [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/top-youtube-artist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2023
Dataset provided by
Kaggle
Authors
Mrityunjay Pathak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
YouTube was created in 2005, with the first video – Me at the Zoo - being uploaded on 23 April 2005. Since then, 1.3 billion people have set up YouTube accounts. In 2018, people watch nearly 5 billion videos each day. People upload 300 hours of video to the site every minute.

According to 2016 research undertaken by Pexeso, music only accounts for 4.3% of YouTube’s content. Yet it makes 11% of the views. Clearly, an awful lot of people watch a comparatively small number of music videos. It should be no surprise, therefore, that the most watched videos of all time on YouTube are predominantly music videos.

On August 13, BTS became the most-viewed artist in YouTube history, accumulating over 26.7 billion views across all their official channels. This count includes all music videos and dance practice videos.

Justin Bieber and Ed Sheeran now hold the records for second and third-highest views, with over 26 billion views each.

Currently, BTS’s most viewed videos are their music videos for “**Boy With Luv**,” “**Dynamite**,” and “**DNA**,” which all have over 1.4 billion views.

Headers of the Dataset Total = Total views (in millions) across all official channels Avg = Current daily average of all videos combined 100M = Number of videos with more than 100 million views
NYC Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/nycopendata/new-york
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
NYC Open Data
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

Content

Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

Over 8 million 311 service requests from 2012-2016

More than 1 million motor vehicle collisions 2012-present

Citi Bike stations and 30 million Citi Bike trips 2013-present

Over 1 billion Yellow and Green Taxi rides from 2009-present

Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

https://opendata.cityofnewyork.us/

https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

Banner Photo by @bicadmedia from Unplash.

Inspiration

On which New York City streets are you most likely to find a loud party?

Can you find the Virginia Pines in New York City?

Where was the only collision caused by an animal that injured a cyclist?

What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
h
OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes...
healthdatagateway.org
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158), OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes [Dataset]. https://healthdatagateway.org/dataset/139
Explore at:
unknownAvailable download formats
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes Dataset number 2.0

Coronavirus disease 2019 (COVID-19) was identified in January 2020. Currently, there have been more than 6 million cases & more than 1.5 million deaths worldwide. Some individuals experience severe manifestations of infection, including viral pneumonia, adult respiratory distress syndrome (ARDS) & death. There is a pressing need for tools to stratify patients, to identify those at greatest risk. Acuity scores are composite scores which help identify patients who are more unwell to support & prioritise clinical care. There are no validated acuity scores for COVID-19 & it is unclear whether standard tools are accurate enough to provide this support. This secondary care COVID OMOP dataset contains granular demographic, morbidity, serial acuity and outcome data to inform risk prediction tools in COVID-19.

PIONEER geography The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. There is a higher than average percentage of minority ethnic groups. WM has a large number of elderly residents but is the youngest population in the UK. Each day >100,000 people are treated in hospital, see their GP or are cared for by the NHS. The West Midlands was one of the hardest hit regions for COVID admissions in both wave 1 & 2.

EHR. University Hospitals Birmingham NHS Foundation Trust (UHB) is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & 100 ITU beds. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”. UHB has cared for >5000 COVID admissions to date. This is a subset of data in OMOP format.

Scope: All COVID swab confirmed hospitalised patients to UHB from January – August 2020. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to care process (timings, staff grades, specialty review, wards), presenting complaint, acuity, all physiology readings (pulse, blood pressure, respiratory rate, oxygen saturations), all blood results, microbiology, all prescribed & administered treatments (fluids, antibiotics, inotropes, vasopressors, organ support), all outcomes.

Available supplementary data: Health data preceding & following admission event. Matched “non-COVID” controls; ambulance, 111, 999 data, synthetic data. Further OMOP data available as an additional service.

Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
US Economic Data
kaggle.com
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Trivino (2024). US Economic Data [Dataset]. https://www.kaggle.com/datasets/xkevnx/us-economic-data/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kevin Trivino
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States
Description
Data was collected from the FRED website.

Contains economic indicators often associated with recessions along with recession status data. Data collected on smallest time unit and earliest time date available for each indicator which results in many nulls but increased flexibility for the users of this dataset.

recession: "1" recessionary period, "0" non-recessionary period (Monthly)

cpi: CPI (1982-1984=INDEX 100) (Monthly)

gdp: Real GDP Billions of Chained 2017 Dollars (Quarterly)

unemployment: Unemployment Rate (Monthly)

m2: M2 Billions of Dollars (Monthly)

fed_funds: Federal Funds Rate (Monthly)

ten_two: 10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity (Monthly)

residential: Real Residential Property Price Rate (Quarterly)

Comprehensive description of each variable can be found at https://fred.stlouisfed.org/
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World, World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
R
Cifar 100 Dataset
universe.roboflow.com
opendatalab.com
+4more
zip
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Popular Benchmarks (2022). Cifar 100 Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/cifar100
Explore at:
zipAvailable download formats
Dataset updated
Aug 11, 2022
Dataset authored and provided by
Popular Benchmarks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Animals People CommonObjects
Description
CIFAR-100

The CIFAR-10 and CIFAR-100 dataset contains labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. * More info on CIFAR-100: https://www.cs.toronto.edu/~kriz/cifar.html * TensorFlow listing of the dataset: https://www.tensorflow.org/datasets/catalog/cifar100 * GitHub repo for converting CIFAR-100 tarball files to png format: https://github.com/knjcode/cifar2png

All images were sized 32x32 in the original dataset

The CIFAR-10 dataset consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images [in the original dataset].

This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). However, this project does not contain the superclasses. * Superclasses version: https://universe.roboflow.com/popular-benchmarks/cifar100-with-superclasses/

More background on the dataset: https://i.imgur.com/5w8A0Vm.png" alt="CIFAR-100 Dataset Classes and Superclassees">

Version 1 (original-images_Original-CIFAR100-Splits):

Original images, with the original splits for CIFAR-100: train (83.33% of images - 50,000 images) set and test (16.67% of images - 10,000 images) set only.

This version was not trained

Version 2 (original-images_trainSetSplitBy80_20):

Original, raw images, with the train set split to provide 80% of its images to the training set (approximately 40,000 images) and 20% of its images to the validation set (approximately 10,000 images)

Trained from Roboflow Classification Model's ImageNet training checkpoint

https://blog.roboflow.com/train-test-split/ https://i.imgur.com/kSPeKGn.png" alt="Train/Valid/Test Split Rebalancing">

Citation:

@TECHREPORT{Krizhevsky09learningmultiple, author = {Alex Krizhevsky}, title = {Learning multiple layers of features from tiny images}, institution = {}, year = {2009} }
E
A database of 100 years (1915-2014) of coastal flooding in the UK
edmed.seadatanet.org
bodc.ac.uk
+1more
nc
Updated Nov 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Southampton School of Ocean and Earth Science (2024). A database of 100 years (1915-2014) of coastal flooding in the UK [Dataset]. https://edmed.seadatanet.org/report/6120/
Explore at:
ncAvailable download formats
Dataset updated
Nov 21, 2024
Dataset authored and provided by
University of Southampton School of Ocean and Earth Science
License
https://vocab.nerc.ac.uk/collection/L08/current/UN/https://vocab.nerc.ac.uk/collection/L08/current/UN/
Time period covered
Jan 1, 1915 - Dec 31, 2014
Area covered

Description
This database, and the accompanying website called ‘SurgeWatch’ (http://surgewatch.stg.rlp.io), provides a systematic UK-wide record of high sea level and coastal flood events over the last 100 years (1915-2014). Derived using records from the National Tide Gauge Network, a dataset of exceedence probabilities from the Environment Agency and meteorological fields from the 20th Century Reanalysis, the database captures information of 96 storm events that generated the highest sea levels around the UK since 1915. For each event, the database contains information about: (1) the storm that generated that event; (2) the sea levels recorded around the UK during the event; and (3) the occurrence and severity of coastal flooding as consequence of the event. The data are presented to be easily assessable and understandable to a wide range of interested parties. The database contains 100 files; four CSV files and 96 PDF files. Two CSV files contain the meteorological and sea level data for each of the 96 events. A third file contains the list of the top 20 largest skew surges at each of the 40 study tide gauge site. In the file containing the sea level and skew surge data, the tide gauge sites are numbered 1 to 40. A fourth accompanying CSV file lists, for reference, the site name and location (longitude and latitude). A description of the parameters in each of the four CSV files is given in the table below. There are also 96 separate PDF files containing the event commentaries. For each event these contain a concise narrative of the meteorological and sea level conditions experienced during the event, and a succinct description of the evidence available in support of coastal flooding, with a brief account of the recorded consequences to people and property. In addition, these contain graphical representation of the storm track and mean sea level pressure and wind fields at the time of maximum high water, the return period and skew surge magnitudes at sites around the UK, and a table of the date and time, offset return period, water level, predicted tide and skew surge for each site where the 1 in 5 year threshold was reached or exceeded for each event. A detailed description of how the database was created is given in Haigh et al. (2015). Coastal flooding caused by extreme sea levels can be devastating, with long-lasting and diverse consequences. The UK has a long history of severe coastal flooding. The recent 2013-14 winter in particular, produced a sequence of some of the worst coastal flooding the UK has experienced in the last 100 years. At present 2.5 million properties and £150 billion of assets are potentially exposed to coastal flooding. Yet despite these concerns, there is no formal, national framework in the UK to record flood severity and consequences and thus benefit an understanding of coastal flooding mechanisms and consequences. Without a systematic record of flood events, assessment of coastal flooding around the UK coast is limited. The database was created at the School of Ocean and Earth Science, National Oceanography Centre, University of Southampton with help from the Faculty of Engineering and the Environment, University of Southampton, the National Oceanography Centre and the British Oceanographic Data Centre. Collation of the database and the development of the website was funded through a Natural Environment Research Council (NERC) impact acceleration grant. The database contributes to the objectives of UK Engineering and Physical Sciences Research Council (EPSRC) consortium project FLOOD Memory (EP/K013513/1).
h
HausaVG
huggingface.co
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HausaNLP (2023). HausaVG [Dataset]. https://huggingface.co/datasets/HausaNLP/HausaVG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2023
Dataset authored and provided by
HausaNLP
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Multi-modal Machine Translation (MMT) enables the use of visual information to enhance the quality of translations, especially where the full context is not available to enable the unambiguous translation in standard machine translation. Despite the increasing popularity of such technique, it lacks sufficient and qualitative datasets to maximize the full extent of its potential. Hausa, a Chadic language, is a member of the Afro-Asiatic language family. It is estimated that about 100 to 150 million people speak the language, with more than 80 million indigenous speakers. This is more than any of the other Chadic languages. Despite the large number of speakers, the Hausa language is considered as a low resource language in natural language processing (NLP). This is due to the absence of enough resources to implement most of the tasks in NLP. While some datasets exist, they are either scarce, machine-generated or in the religious domain. Therefore, there is the need to create training and evaluation data for implementing machine learning tasks and bridging the research gap in the language. This work presents the Hausa Visual Genome (HaVG), a dataset that contains the description of an image or a section within the image in Hausa and its equivalent in English. The dataset was prepared by automatically translating the English description of the images in the Hindi Visual Genome (HVG). The synthetic Hausa data was then carefully postedited, taking into cognizance the respective images. The data is made of 32,923 images and their descriptions that are divided into training, development, test, and challenge test set. The Hausa Visual Genome is the first dataset of its kind and can be used for Hausa-English machine translation, multi-modal research, image description, among various other natural language processing and generation tasks.
Adult Datasets
kaggle.com
Updated Jan 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brijesh B. Mehta (2019). Adult Datasets [Dataset]. https://www.kaggle.com/datasets/brijeshbmehta/adult-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Brijesh B. Mehta
Description
Context

I am working in the area of Privacy Preserving Big Data Publishing. The state-of-art approaches were tested on Adult dataset. I found that Adult dataset is available at UCI repository but synthetic version wasn't available anywhere. As I am working with big data, I need large size of data to justify my contribution. Therefore, I created my own version of synthetic datasets with 100 thousands, 1 million, 10 millions and 100 millions numbers of records. Here I am sharing the original Adult dataset with approx 33 thousands records and the synthesis versions Adult100k, Adult 1m, Adult10m and Adult100m.

Content

Adult dataset contains census information.

Acknowledgements

I would like to thank UCI repository for providing the base dataset without which I may not be able to synthesis the large data.

Inspiration

The datasets might be helpful to all those who wants to work on Big Data Privacy.
P
WikiText-103 Dataset
paperswithcode.com
opendatalab.com
Updated Oct 2, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher (2016). WikiText-103 Dataset [Dataset]. https://paperswithcode.com/dataset/wikitext-103
Explore at:
Dataset updated
Oct 2, 2016
Authors
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher
Description
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers - all of which are removed in PTB. As it is composed of full articles, the dataset is well suited for models that can take advantage of long term dependencies.
Top 100 Largest Banks
kaggle.com
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramzan Shaheen (2025). Top 100 Largest Banks [Dataset]. https://www.kaggle.com/datasets/iamramzanai/top-100-largest-banks/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2025
Dataset provided by
Kaggle
Authors
Ramzan Shaheen
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Summary

This dataset contains information about the largest banks globally, including their rank, name, and total assets (in US$ billion as of 2023). The data was scraped from Wikipedia's List of Largest Banks. It can be used for financial analysis, market research, and educational purposes.

Dataset Structure

Columns

Rank: The rank of the bank based on total assets.

Bank Name: The name of the bank.

Total Assets (2023, US$ billion): The total assets of the bank in billions of US dollars as of 2023.

Example

Rank Bank Name Total Assets (2023, US$ billion)
1 Industrial & Commercial Bank of China (ICBC) 5,000
2 China Construction Bank 4,500

Source

The data was scraped from Wikipedia's List of Largest Banks using Python and Scrapy.

Usage

This dataset can be used for: - Financial market research. - Trend analysis in global banking. - Educational purposes and data visualization.

Licensing

The data is publicly available under Wikipedia's Terms of Use.

Limitations

The data may not reflect real-time changes as it was scraped from a static page.

Possible inaccuracies due to updates or inconsistencies on the source page.

Acknowledgements

Thanks to Wikipedia and the contributors of the "List of Largest Banks" page.

Citation

If you use this dataset, please cite it as: @misc{largestbanks2023, author = {Your Name or Organization}, title = {Largest Banks Dataset}, year = {2023}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/your-dataset-name} }

Rank	Bank Name	Total Assets (2023, US$ billion)
1	Industrial & Commercial Bank of China (ICBC)	5,000
2	China Construction Bank	4,500

Facebook: distribution of global audiences 2024, by age and gender

statista.com
davegsmith.com

Updated Jun 17, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon (2025). Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset updated

Jun 17, 2025

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

              Facebook connects the world

              Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
              as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

Federal Net Outlays as Percent of GDP
kaggle.com
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
St. Louis Fed (2019). Federal Net Outlays as Percent of GDP [Dataset]. https://www.kaggle.com/datasets/stlouisfed/federal-net-outlays-as-percent-of-gdp
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2019
Dataset provided by
Kaggle
Authors
St. Louis Fed
Description
Content

Federal Net Outlays as Percent of Gross Domestic Product (FYONGDA188S) was first constructed by the Federal Reserve Bank of St. Louis in January 2013. It is calculated using Federal Net Outlays (FYONET) and Gross Domestic Product (GDPA): FYONGDA188S= ((FYONET/1000)/GDPA)*100 FYONET/1000 transforms FYONET from millions of dollars to billions of dollars.

Context

This is a dataset from the Federal Reserve Bank of St. Louis hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according to the frequency that the data updates. Explore the Federal Reserve Bank of St. Louis using Kaggle and all of the data sources available through the St. Louis Fed organization page!

Update Frequency: This dataset is updated daily.

Observation Start: 1929-01-01

Observation End : 2018-01-01

Acknowledgements

This dataset is maintained using FRED's API and Kaggle's API.

Cover photo by Luis Mézquita on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
d
Alesco Phone ID Database - Phone Data with over 860 Million Phone Number...
datarade.ai
.csv, .xls, .txt
Updated Jul 5, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alesco Data (2018). Alesco Phone ID Database - Phone Data with over 860 Million Phone Number with Carrier Name, covers 94% of the US population - available for licensing! [Dataset]. https://datarade.ai/data-products/alesco-phone-id-database-the-industry-s-largest-and-most-ac-alesco-data
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 5, 2018
Dataset authored and provided by
Alesco Data
Area covered
United States
Description
The Alesco Phone ID Database data ties together a consumer's true identity, and with linkage to the Alesco Power Identity Graph, we are perfectly positioned to help customers solve today's most challenging marketing, analytics, and identity resolution problems.

Our proprietary Phone ID database combines public and private sources and validates phone numbers against current and historical data 24 hours a day, 365 days a year.

With over 650 million unique phone numbers, device and service information, our one-of-a-kind solutions are now available for your marketing and identity resolution challenges in both B2C and B2B applications!

• Alesco Phone ID provides more than 860 million phone numbers monthly linked to a consumer or business name and includes landline, mobile phone number, VoIP, private and business phone numbers — all permissibly obtained and privacy-compliant and linked to other Alesco data sets

• How we do it: Alesco Phone ID is multi-sourced with daily information and delivered monthly or quarterly to clients. Our proprietary machine learning and advanced analytics processes ensure quality levels far above industry standards. Alesco processes over 100 million phone signals per day, compiling, normalizing, and standardizing phone information from 37 input sources.

• Accuracy: Each of Alesco’s phone data sources are vetted to ensure they are authoritative, giving you confidence in the accuracy of the information. Every record is validated, verified and processed to ensure the widest, most reliable coverage combined with stunning precision.

Ease of use: Alesco’s Phone ID Database is available as an on-premise phone database license, giving you full control to host and access this powerful resource on-site. Ongoing updates are provided on a monthly basis ensure your data is up to date.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mrityunjay Pathak (2023). Top Youtube Artist [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/top-youtube-artist

Top Youtube Artist

Top Youtube Artist with Total Views (in millions) across all Official Channels

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 12, 2023

Dataset provided by

Kaggle

Authors

Mrityunjay Pathak

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

YouTube

Description

YouTube was created in 2005, with the first video – Me at the Zoo - being uploaded on 23 April 2005. Since then, 1.3 billion people have set up YouTube accounts. In 2018, people watch nearly 5 billion videos each day. People upload 300 hours of video to the site every minute.

According to 2016 research undertaken by Pexeso, music only accounts for 4.3% of YouTube’s content. Yet it makes 11% of the views. Clearly, an awful lot of people watch a comparatively small number of music videos. It should be no surprise, therefore, that the most watched videos of all time on YouTube are predominantly music videos.

On August 13, BTS became the most-viewed artist in YouTube history, accumulating over 26.7 billion views across all their official channels. This count includes all music videos and dance practice videos.

Justin Bieber and Ed Sheeran now hold the records for second and third-highest views, with over 26 billion views each.

Currently, BTS’s most viewed videos are their music videos for “**Boy With Luv**,” “**Dynamite**,” and “**DNA**,” which all have over 1.4 billion views.

Headers of the Dataset Total = Total views (in millions) across all official channels Avg = Current daily average of all videos combined 100M = Number of videos with more than 100 million views

Clear search

Close search

Google apps

Main menu

Top Youtube Artist

NYC Open Data

Context

Content

Acknowledgements

Inspiration

OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes...

US Economic Data

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Cifar 100 Dataset

CIFAR-100

All images were sized 32x32 in the original dataset

Version 1 (original-images_Original-CIFAR100-Splits):

Version 2 (original-images_trainSetSplitBy80_20):

Citation:

A database of 100 years (1915-2014) of coastal flooding in the UK

HausaVG

Adult Datasets

Context

Content

Acknowledgements

Inspiration

WikiText-103 Dataset

Top 100 Largest Banks

Dataset Summary

Dataset Structure

Columns

Example

Source

Usage

Licensing

Limitations

Acknowledgements

Citation

Facebook: distribution of global audiences 2024, by age and gender

Federal Net Outlays as Percent of GDP

Content

Context

Acknowledgements

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Alesco Phone ID Database - Phone Data with over 860 Million Phone Number...

Top Youtube Artist

Top Youtube Artist with Total Views (in millions) across all Official Channels