18 datasets found

World Marriage Dataset
kaggle.com
Updated Jul 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrar Hussain (2024). World Marriage Dataset [Dataset]. https://www.kaggle.com/datasets/dataanalyst001/world-marriage-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ibrar Hussain
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This World Marriage Dataset provides a comparable and up-to-date set of data on the marital status of the population by age and sex for 232 countries or different regions of the world from 1970 to 2019. There are 271605 rows and 9 columns in this dataset. Each row of the dataset represents a specific age group of men, either divorced or married or Single. The columns include:

Sr. No.: A serial number to identify each entry. Country: The country of focus. Age Group: The age range of the surveyed individuals. Sex: The gender of the surveyed individuals. Marital Status: The marital status of the individuals, categorized as either "Divorced" or "Married" or "Single". Data Process: The method used to collect the data. Data Collection (Start Year): The year when data collection began. Data Collection (End Year): The year when data collection ended. Data Source: The source of the data. This dataset helps to understand the marital status distribution among different age groups of men and women in all over the world from 1970 to 2019.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
t
Overcrowding rate by age group - population without single-person households...
service.tib.eu
db.nomics.world
+1more
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Overcrowding rate by age group - population without single-person households - EU-SILC survey [Dataset]. https://service.tib.eu/ldmservice/dataset/eurostat_ub4yxjwglcni8bf3erimq
Explore at:
Dataset updated
Jan 8, 2025
Description
This indicator is defined as the percentage of the population living in an overcrowded household (excluding the single-person households). A person is considered as living in an overcrowded household if the household does not have at its disposal a minimum of rooms equal to: - one room for the household; - one room by couple in the household; - one room for each single person aged 18 and more; - one room by pair of single people of the same sex between 12 and 17 years of age; - one room for each single person between 12 and 17 years of age and not included in the previous category; - one room by pair of children under 12 years of age. The indicator is presented by age group.
World Religions Across Regions
kaggle.com
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). World Religions Across Regions [Dataset]. https://www.kaggle.com/datasets/thedevastator/a-global-perspective-on-world-religions-1945-201
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Area covered
World
Description
World Religions Across Regions

Analyzing Adherence Across Regions, States and the Global System

By Correlates of War Project [source]

About this dataset

The World Religion Project (WRP) is an ambitious endeavor to conduct a comprehensive analysis of religious adherence throughout the world from 1945 to 2010. This cutting-edge project offers unparalleled insight into the religious behavior of people in different countries, regions, and continents during this time period. Its datasets provide important information about the numbers and percentages of adherents across a multitude of different religions, religion families, and non-religious affiliations.

The WRP consists of three distinct datasets: the national religion dataset, regional religion dataset, and global religion dataset. Each is focused on understanding individually specific realms for varied analysis approaches - from individual states to global systems. The national dataset provides data on number of adherents by state as well as percentage population practicing a given faith group in five-year increments; focusing attention to how this number evolves from nation to nation over time. Similarly, regional data is provided at five year intervals highlighting individual region designations with one modification – Pacific Ocean states have been reclassified into their own Oceania category according to Country Code Number 900 or above). Finally at a global level – all states are aggregated in order that we may understand a snapshot view at any five-year interval between 1945‐2010 regarding relationships between religions or religio‐families within one location or transnationally.

This project was developed in three stages: firstly forming a religions tree (a systematic classification), secondly collecting data such as this provided by WRP according to that classification structure – lastly cleaning the data so discrepancies may be reconciled and imported where needed with gaps selected when unknown values were encountered during collection process . We would encourage anyone wishing details undergoing more detailed reading/analysis relating various use applications for these rich datasets - please contact Zeev Maoz (University California Davis) & Errol A Henderson _(Pennsylvania State University)

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The World Religions Project (WRP) dataset offers a comprehensive look at religious adherence around the world within a single dataset. With this dataset, you can track global religious trends over a period of 65 years and explore how they’ve changed during that time. By exploring the WRP data set, you’ll gain insight into cross-regional and cross-time patterns in religious affiliation around the world.

Research Ideas

Analyzing historical patterns of religious growth and decline across different regions

Creating visualizations to compare religious adherence in various states, countries, or globally

Studying the impact of governmental policies on religious participation over time

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: WRP regional data.csv | Column name | Description | |:-----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------| | Year | Reference year for data collection. (Integer) | | Region | World region according to Correlates Of War (COW) Regional Systemizations with one modification (Oceania category for COW country code ...
w
65 to 74 years poverty in On Top of the World, Florida (2022)
welfareinfo.org
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WelfareInfo.org (2024). 65 to 74 years poverty in On Top of the World, Florida (2022) [Dataset]. https://www.welfareinfo.org/poverty-rate/florida/on-top-of-the-world/stat-single-people-65-74-years-old/
Explore at:
Dataset updated
Sep 12, 2024
Dataset provided by
WelfareInfo.org
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Florida, On Top of the World
Description
65 to 74 years Poverty Rate Statistics for 2022. This is part of a larger dataset covering poverty in On Top of the World, Florida by age, education, race, gender, work experience and more.
C
Death Profiles by County
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county
Explore at:
csv(24235858), csv(21575405), csv(11738570), csv(15127221), csv(60676655), csv(1128641), csv(60023260), csv(28125832), csv(75015194), csv(74043128), csv(74351424), csv(74497014), csv(60201673), csv(74689382), csv(73906266), csv(60517511), csv(52019564), zip, csv(5095)Available download formats
Dataset updated
May 28, 2025
Dataset authored and provided by
California Department of Public Health
Description
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
COVID-19 Vaccine Progress Dashboard Data
data.chhs.ca.gov
data.ca.gov
+3more
csv, xlsx, zip
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data [Dataset]. https://data.chhs.ca.gov/dataset/vaccine-progress-dashboard
Explore at:
csv(188895), csv(26828), csv(111682), csv(82754), csv(54906), csv(638738), csv(110928434), csv(303068812), xlsx(11731), csv(503270), csv(2447143), csv(2641927), zip, xlsx(11534), csv(6772350), xlsx(11249), csv(148732), csv(675610), csv(724860)Available download formats
Dataset updated
Jun 24, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

On 6/16/2023 CDPH replaced the booster measures with a new “Up to Date” measure based on CDC’s new recommendations, replacing the primary series, boosted, and bivalent booster metrics The definition of “primary series complete” has not changed and is based on previous recommendations that CDC has since simplified. A person cannot complete their primary series with a single dose of an updated vaccine. Whereas the booster measures were calculated using the eligible population as the denominator, the new up to date measure uses the total estimated population. Please note that the rates for some groups may change since the up to date measure is calculated differently than the previous booster and bivalent measures.

This data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/ which summarizes vaccination data at the county level by county of residence. Where county of residence was not reported in a vaccination record, the county of provider that vaccinated the resident is included. This applies to less than 1% of vaccination records. The sum of county-level vaccinations does not equal statewide total vaccinations due to out-of-state residents vaccinated in California.

These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

Totals for the Vaccine Progress Dashboard and this dataset may not match, as the Dashboard totals doses by Report Date and this dataset totals doses by Administration Date. Dose numbers may also change for a particular Administration Date as data is updated.

Previous updates:

On March 3, 2023, with the release of HPI 3.0 in 2022, the previous equity scores have been updated to reflect more recent community survey information. This change represents an improvement to the way CDPH monitors health equity by using the latest and most accurate community data available. The HPI uses a collection of data sources and indicators to calculate a measure of community conditions ranging from the most to the least healthy based on economic, housing, and environmental measures.

Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 16+ and age 5+ denominators have been uploaded as archived tables.

Starting on May 29, 2021 the methodology for calculating on-hand inventory in the shipped/delivered/on-hand dataset has changed. Please see the accompanying data dictionary for details. In addition, this dataset is now down to the ZIP code level.
i
Household Demographic Surveillance System, Cause-Specific Mortality...
catalog.ihsn.org
datacatalog.ihsn.org
Updated Mar 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wasif A. Khan (2019). Household Demographic Surveillance System, Cause-Specific Mortality 1992-2012 - World [Dataset]. https://catalog.ihsn.org/catalog/5541
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Shashi Kant
Berhe Weldearegawi
Stephen M. Tollman
Abdramane Soura
Wasif A. Khan
Margaret Gyapong
Siswanto Wilopo
Abraham J. Herbst
P. Kim Streatfield
Ali Sie
Frank O. Odhiambo
Amelia Crampin
Nurul Alam
Peter Byass
Bassirou Bonfoh
Valérie Delaunay
Abraham Oduro
Marcel Tanner
Thomas N. Williams
Osman A. Sankoh
Momodou Jasseh
Nguyen T.K. Chuc
Alex Ezeh
Abba Bhuiya
Sanjay Juvekar
Time period covered
1992 - 2012
Area covered
World, World
Description
Abstract

Cause of death data based on VA interviews were contributed by fourteen INDEPTH HDSS sites in sub-Saharan Africa and eight sites in Asia. The principles of the Network and its constituent population surveillance sites have been described elsewhere [1]. Each HDSS site is committed to long-term longitudinal surveillance of circumscribed populations, typically each covering around 50,000 to 100,000 people. Households are registered and visited regularly by lay field-workers, with a frequency varying from once per year to several times per year. All vital events are registered at each such visit, and any deaths recorded are followed up with verbal autopsy interviews, usually 147 undertaken by specially trained lay interviewers. A few sites were already operational in the 1990s, but in this dataset 95% of the person-time observed related to the period from 2000 onwards, with 58% from 2007 onwards. Two sites, in Nairobi and Ouagadougou, followed urban populations, while the remainder covered areas that were generally more rural in character, although some included local urban centres. Sites covered entire populations, although the Karonga, Malawi, site only contributed VAs for deaths of people aged 12 years and older. Because the sites were not located or designed in a systematic way to be representative of national or regional populations, it is not meaningful to aggregate results over sites.

All cause of death assignments in this dataset were made using the InterVA-4 model version 4.02 [2]. InterVA-4 uses probabilistic modelling to arrive at likely cause(s) of death for each VA case, the workings of the model being based on a combination of expert medical opinion and relevant available data. InterVA-4 is the only model currently available that processes VA data according to the WHO 2012 standard and categorises causes of death according to ICD-10. Since the VA data reported here were collected before the WHO 2012 standard was formulated, they were all retrospectively transformed into the WHO 2012 and InterVA-4 input format for processing.

The InterVA-4 model was applied to the data from each site, yielding, for each case, up to three possible causes of death or an indeterminate result. Each cause for a case is a single record in the dataset. In a minority of cases, for example where symptoms were vague, contradictory or mutually inconsistent, it was impossible for InterVA-4 to determine a cause of death, and these deaths were attributed as entirely indeterminate. For the remaining cases, one to three likely causes and their likelihoods were assigned by InterVA-4, and if the sum of their likelihoods was less than one, the residual component was then assigned as being indeterminate. This was an important process for capturing uncertainty in cause of death outcome(s) from the model at the individual level, thus avoiding over-interpretation of specific causes. As a consequence there were three sources of unattributed cause of death: deaths registered for which VAs were not successfully completed; VAs completed but where the cause was entirely indeterminate; and residual components of deaths attributed as indeterminate.

In this dataset each case has between one and four records, each with its own cause and likelihood. Cases for which VAs were not successfully completed has a single record with the cause of death recorded as “VA not completed” and a likelihood of one. Thus the overall sum of the likelihoods equated to the total number of deaths. Each record also contains a population weighting factor reflecting the ratio of the population fraction for its site, age group, sex and year to the corresponding age group and sex fraction in the standard population (see section on weighting).

In this context, all of these data are secondary datasets derived from primary data collected separately by each participating site. In all cases the primary data collection was covered by site-level ethical approvals relating to on-going demographic surveillance in those specific locations. No individual identity or household location data are included in this secondary data.

Sankoh O, Byass P. The INDEPTH Network: filling vital gaps in global epidemiology. International Journal of Epidemiology 2012; 41:579-588.

Byass P, Chandramohan D, Clark SJ, D’Ambruoso L, Fottrell E, Graham WJ, et al. Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool. Global Health Action 2012; 5:19281.

Geographic coverage

Demographic surveiallance areas (countries from Africa, Asia and Oceania) of the following HDSSs:
Code Country INDEPTH Centre
BD011 Bangladesh ICDDR-B : Matlab
BD012 Bangladesh ICDDR-B : Bandarban
BD013 Bangladesh ICDDR-B : Chakaria
BD014 Bangladesh ICDDR-B : AMK BF031 Burkina Faso Nouna BF041 Burkina Faso Ouagadougou
CI011 Côte d'Ivoire Taabo ET031 Ethiopia Kilite Awlaelo
GH011 Ghana Navrongo
GH031 Ghana Dodowa
GM011 The Gambia Farafenni ID011 Indonesia Purworejo IN011 India Ballabgarh
IN021 India Vadu
KE011 Kenya Kilifi
KE021 Kenya Kisumu
KE031 Kenya Nairobi
MW011 Malawi Karonga
SN011 Senegal IRD : Bandafassi VN012 Vietnam Hanoi Medical University : Filabavi
ZA011 South Africa Agincourt ZA031 South Africa Africa Centre

Analysis unit

Death Cause

Universe

Surveillance population Deceased individuals Cause of death

Kind of data

Verbal autopsy-based cause of death data

Frequency of data collection

Rounds per year varies between sites from once to three times per year

Sampling procedure

No sampling, covers total population in demographic surveillance area

Mode of data collection

Face-to-face [f2f]

Research instrument

The Verbal Autopsy Questionnaires used by the various sites differed, but in most cases they were a derivation from the original WHO Verbal Autopsy questionnaire.

http://www.who.int/healthinfo/statistics/verbalautopsystandards/en/index1.html

Cleaning operations

One cause of death record was inserted for every death where a verbal autopsy was not conducted. The cuase of death assigned in these cases is "XX VA not completed"
t
Overcrowding rate by sex - EU-SILC survey
service.tib.eu
opendata.marche.camcom.it
+1more
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Overcrowding rate by sex - EU-SILC survey [Dataset]. https://service.tib.eu/ldmservice/dataset/eurostat_73gaskdkzvpaxxuj7oqg
Explore at:
Dataset updated
Jan 8, 2025
Description
This indicator is defined as the percentage of the population living in an overcrowded household. A person is considered as living in an overcrowded household if the household does not have at its disposal a minimum of rooms equal to: - one room for the household; - one room by couple in the household; - one room for each single person aged 18 and more; - one room by pair of single people of the same sex between 12 and 17 years of age; - one room for each single person between 12 and 17 years of age and not included in the previous category; - one room by pair of children under 12 years of age. The indicator is presented by sex.
The GDELT Project
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/datasets/gdelt/gdelt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset authored and provided by
The GDELT Project
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

Content

GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).
F
Audio Visual Speech Dataset: American English
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
Dataset Content
This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
•Participant Diversity:
•
Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.

•
Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

Video Data
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
•Recording Details:
•
File Duration: Average duration of 30 seconds to 3 minutes per video.

•
Formats: Videos are available in MP4 or MOV format.

•
Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.

•
Device: Both the latest Android and iOS devices are used in this collection.

•
Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:

•
Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.

•
Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.

•
Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.

•
Face Orientation: Contains straight face and tilted face angles.

•
Participant Positions: Records participants in both standing and seated positions.

•
Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.

•
Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.

•
Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.

•
Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:

•Happy
•Sad
•Excited
•Angry
•Annoyed
•Normal
•
Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

Metadata
The dataset provides comprehensive metadata for each video recording and participant:
•
f
Data from: LandScan Global, 30 Arc-second Annual Global Gridded Population...
springernature.figshare.com
zip
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Viswadeep Lebakula; Kelly Sims; Andrew Reith; Amy Rose; Jacob McKee; Phil Coleman; Jason C. Kaufman; Marie Urban; warren christopher jochem; Carrie Whitlock; Mitchell Ogden; Joe Pyle; Darrell Roddy; Justin Epting; Edward A. Bright (2025). LandScan Global, 30 Arc-second Annual Global Gridded Population Datasets from 2000 to 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.28439699.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28439699.v1
Dataset updated
Mar 25, 2025
Dataset provided by
figshare
Authors
Viswadeep Lebakula; Kelly Sims; Andrew Reith; Amy Rose; Jacob McKee; Phil Coleman; Jason C. Kaufman; Marie Urban; warren christopher jochem; Carrie Whitlock; Mitchell Ogden; Joe Pyle; Darrell Roddy; Justin Epting; Edward A. Bright
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Using an innovative approach that combines geospatial science, remote sensing technology, and machine learning algorithms, LandScan Global is a global population distribution data, at 30 arc seconds (roughly 1km at equator), representing an ambient (24 hour average) population. The LandScan Global algorithm, an R&D 100 Award Winner, uses spatial data, high-resolution imagery exploitation, and a multi-variable dasymetric modeling approach to disaggregate census counts within an administrative boundary. Since no single population distribution model can account for the differences in spatial data availability, quality, scale, and accuracy as well as the differences in cultural settlement practices, LandScan population distribution models are tailored to match the data conditions and geographical nature of each individual country and region. By modeling an ambient population, LandScan Global captures the full potential activity space of people throughout the course of the day and night rather than just a residential location.
A
‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-top-data-breaches-2004-2021-e7ac/746cf4e2/?iid=002-610&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.

--- Dataset description provided by original source is as follows ---

This is a dataset containing all the major data breaches in the world from 2004 to 2021

As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

--- Original source retains full ownership of the source dataset ---
Data from: ddRAD-seq generated genomic SNP dataset of Central and Southeast...
zenodo.org
data.niaid.nih.gov
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Botond Lados; Botond Lados; Klára Cseke; Klára Cseke; Attila Benke; Attila Benke; Zoltán Attila Köbölkuti; Zoltán Attila Köbölkuti; Csilla Éva Molnár; Csilla Éva Molnár; László Nagy; László Nagy; Norbert Móricz; Norbert Móricz; Tamás Márton Németh; Tamás Márton Németh; Attila Borovics; Attila Borovics; Ilona Mészáros; Ilona Mészáros; Endre Gy. Tóth; Endre Gy. Tóth (2024). ddRAD-seq generated genomic SNP dataset of Central and Southeast European Turkey oak (Quercus cerris L.) populations [Dataset]. http://doi.org/10.5281/zenodo.7568727
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7568727
Dataset updated
Feb 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Botond Lados; Botond Lados; Klára Cseke; Klára Cseke; Attila Benke; Attila Benke; Zoltán Attila Köbölkuti; Zoltán Attila Köbölkuti; Csilla Éva Molnár; Csilla Éva Molnár; László Nagy; László Nagy; Norbert Móricz; Norbert Móricz; Tamás Márton Németh; Tamás Márton Németh; Attila Borovics; Attila Borovics; Ilona Mészáros; Ilona Mészáros; Endre Gy. Tóth; Endre Gy. Tóth
Description
Turkey oak (Quercus cerris L.) is one of the ecologically and economically most important deciduous tree species in the Central and Southeast European regions. The species distribution range covers hundreds of thousands of hectares throughout the Apennine and Balkan Peninsula, the Carpathian Basin to Asia Minor. Turkey oak has long been known exhibit high levels of genetic and phenotypic variation. Recent predictions on climate responses of this species suggest a significant extension of its distribution in Europe under climate change. Since Turkey oak has relative drought-tolerant behavior, it is regarded as a potential alternative for other forest tree species during forestry climate adaptation efforts, not only in its native regions but in Western Europe as well. For this reason, the survey of existing genetic variability, genetic resources and adaptability of this species has great importance. Next-generation sequencing approaches, such as ddRAD-seq (Double digest restriction-site associated DNA sequencing), allow for obtaining high-resolution genome-wide simple nucleotide polymorphisms (SNPs). Based on thousands of SNP markers the genetic structure of populations and the genetic background of adaptation processes can be studied in far more depth than ever before. In this study, we provide highly variable genome-wide SNP data belonging to Turkey oak for the first time. This dataset comprises the SNP data of 88 individuals of eight populations, two from Bulgaria, one from Kosovo and five from Hungary, respectively. The high-resolution genome-wide markers are suitable to infer genetic diversity, differentiation, population structure and to investigate selection and local adaptation. The dataset accessible at: https://doi.org/10.5281/zenodo.7568727
CDC COVID-19 Vaccine Tracker
kaggle.com
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). CDC COVID-19 Vaccine Tracker [Dataset]. https://www.kaggle.com/datasets/thedevastator/cdc-covid-19-vaccine-tracker
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
CDC COVID-19 Vaccine Tracker

Cumulative and Daily Counts of COVID-19 Vaccine Doses in the United States

By Nicky Forster [source]

About this dataset

The dataset contains data points such as the cumulative count of people who have received at least one dose of the vaccine, new doses administered on a specific date, cumulative count of doses distributed in the country, percentage of population that has completed the full vaccine series, cumulative count of Pfizer and Moderna vaccine doses administered in each state, seven-day rolling averages for new doses administered and distributed, among others.

It also provides insights into the vaccination status at both national and state levels. The dataset includes information on the percentage of population that has received at least one dose of the vaccine, percentage of population that has completed the full vaccine series, cumulative counts per 100k population for both distributed and administered doses.

Additionally, it presents data specific to each state, including their abbreviation and name. It outlines details such as cumulative counts per 100k population for both distributed and administered doses in each state. Furthermore, it indicates if there were instances where corrections resulted in single-day negative counts.

The dataset is compiled from daily snapshots obtained from CDC's COVID Data Tracker. Please note that there may be reporting delays by healthcare providers up to 72 hours after administering a dose.

This comprehensive dataset serves various purposes including tracking vaccination progress over time across different locations within the United States. It can be used by researchers, policymakers or anyone interested in analyzing trends related to COVID-19 vaccination efforts at both national and state levels

How to use the dataset

Familiarize Yourself with the Columns: Take a look at the available columns in this dataset to understand what information is included. These columns provide details such as state abbreviations, state names, dates of data snapshots, cumulative counts of doses distributed and administered, people who have received at least one dose or completed the vaccine series, percentages of population coverage, manufacturer-specific data, and seven-day rolling averages.

Explore Cumulative Counts: The dataset includes cumulative counts that show the total number of doses distributed or administered over time. You can analyze these numbers to track trends in vaccination progress in different states or regions.

Analyze Daily Counts: The dataset also provides daily counts of new vaccine doses distributed and administered on specific dates. By examining these numbers, you can gain insights into vaccination rates on a day-to-day basis.

Study Population Coverage Metrics: Metrics such as pct_population_received_at_least_one_dose and pct_population_series_complete give you an understanding of how much of each state's population has received at least one dose or completed their vaccine series respectively.

Utilize Manufacturer Data: The columns related to Pfizer and Moderna provide information about the number of doses administered for each manufacturer separately. By analyzing this data, you can compare vaccination rates between different vaccines.

Consider Rolling Averages: The seven-day rolling average columns allow you to smooth out fluctuations in daily counts by calculating an average over a week's time window. This can help identify long-term trends more accurately.

Compare States: You can compare vaccination progress between different states by filtering the dataset based on state names or abbreviations. This way, you can observe variations in distribution and administration rates among different regions.

Visualize the Data: Creating charts and graphs will help you visualize the data more effectively. Plotting trends over time or comparing different metrics for various states can provide powerful visual representations of vaccination progress.

Stay Informed: Keep in mind that this dataset is continuously updated as new data becomes available. Make sure to check for any updates or refreshed datasets to obtain the most recent information on COVID-19 vaccine distributions and administrations

Research Ideas

Vaccination Analysis: This dataset can be used to analyze the progress of COVID-19 vaccinations in the United States. By examining the cumulative counts of doses distributed and administered, as well as the number of people who have received at least one dose or completed the vaccine series, researchers and policymakers can assess how effectively vaccines are being rolled out and monitor...
COVID Vaccination in World (updated daily)
kaggle.com
zip
Updated Jun 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishav Sharma (2021). COVID Vaccination in World (updated daily) [Dataset]. https://www.kaggle.com/rsrishav/covid-vaccination-dataset
Explore at:
zip(544681 bytes)Available download formats
Dataset updated
Jun 21, 2021
Authors
Rishav Sharma
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
World
Description
Context

The data is collected from OWID (Our World in Data) GitHub repository, which is updated on daily bases.

Content

This dataset contains only one file vaccinations.csv, which contains the records of vaccination doses received by people from all the countries. * location: name of the country (or region within a country). * iso_code: ISO 3166-1 alpha-3 – three-letter country codes. * date: date of the observation. * total_vaccinations: total number of doses administered. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses). If a person receives one dose of the vaccine, this metric goes up by 1. If they receive a second dose, it goes up by 1 again. * total_vaccinations_per_hundred: total_vaccinations per 100 people in the total population of the country. * daily_vaccinations_raw: daily change in the total number of doses administered. It is only calculated for consecutive days. This is a raw measure provided for data checks and transparency, but we strongly recommend that any analysis on daily vaccination rates be conducted using daily_vaccinations instead. * daily_vaccinations: new doses administered per day (7-day smoothed). For countries that don't report data on a daily basis, we assume that doses changed equally on a daily basis over any periods in which no data was reported. This produces a complete series of daily figures, which is then averaged over a rolling 7-day window. An example of how we perform this calculation can be found here. * daily_vaccinations_per_million: daily_vaccinations per 1,000,000 people in the total population of the country. * people_vaccinated: total number of people who received at least one vaccine dose. If a person receives the first dose of a 2-dose vaccine, this metric goes up by 1. If they receive the second dose, the metric stays the same. * people_vaccinated_per_hundred: people_vaccinated per 100 people in the total population of the country. * people_fully_vaccinated: total number of people who received all doses prescribed by the vaccination protocol. If a person receives the first dose of a 2-dose vaccine, this metric stays the same. If they receive the second dose, the metric goes up by 1. * people_fully_vaccinated_per_hundred: people_fully_vaccinated per 100 people in the total population of the country.

Note: for people_vaccinated and people_fully_vaccinated we are dependent on the necessary data being made available, so we may not be able to make these metrics available for some countries.

Acknowledgements

This data collected by Our World in Data which gets updated daily on their Github.

Inspiration

Possible uses for this dataset could include: - Sentiment analysis in a variety of forms - Statistical analysis over time.
London Heathrow precipitations 2010-2019
kaggle.com
Updated Feb 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emanuele Fumagalli (2020). London Heathrow precipitations 2010-2019 [Dataset]. https://www.kaggle.com/datasets/emafuma/ncei-heathrow-2010-2019/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 25, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Emanuele Fumagalli
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The idea is to have a very simple time series dataset to be used for experiments with easy but effective visualizations on actual data. It is amazing how much a single graph can comunicate syntehetically a lot of information.

Content

The dataset was downloaded from the National Centers for Environmental Information (NCEI), the data is in the public domain and can be used freely. If interested in generating a similar dataset from another station you can start from the Search Tool select Daily Summaries, the time range of interest, search for Cities and in the Search Term put the city you're looking for. When selected you need to add to Cart like an order but there is no charge for ordering data from Climate Data Online as explained in their FAQs.

Acknowledgements

Thanks to National Centers for Environmental Information for collecting and making available for free meteorological data from many stations all over the world. In case using the same dataset or generating a new one from NCEI you need to cite the origin.

Inspiration

Mostly to see how many different effective visualizations can be generated from a very simple dataset.
India Census: Population: Age: 18
ceicdata.com
Updated Nov 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2019). India Census: Population: Age: 18 [Dataset]. https://www.ceicdata.com/en/india/census-population-by-single-age/census-population-age-18
Explore at:
Dataset updated
Nov 15, 2019
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 1, 1991 - Mar 1, 2011
Area covered
India
Variables measured
Population
Description
India Census: Population: Age: 18 data was reported at 27,958,147.000 Person in 2011. This records an increase from the previous number of 27,686,902.000 Person for 2001. India Census: Population: Age: 18 data is updated yearly, averaging 27,686,902.000 Person from Mar 1991 (Median) to 2011, with 3 observations. The data reached an all-time high of 27,958,147.000 Person in 2011 and a record low of 23,656,856.000 Person in 1991. India Census: Population: Age: 18 data remains active status in CEIC and is reported by Census of India. The data is categorized under India Premium Database’s Demographic – Table IN.GAD002: Census: Population: by Single Age.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ibrar Hussain (2024). World Marriage Dataset [Dataset]. https://www.kaggle.com/datasets/dataanalyst001/world-marriage-dataset

World Marriage Dataset

Comprehensive Dataset of Marriage Statistics Worldwide

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 21, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ibrar Hussain

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This World Marriage Dataset provides a comparable and up-to-date set of data on the marital status of the population by age and sex for 232 countries or different regions of the world from 1970 to 2019. There are 271605 rows and 9 columns in this dataset. Each row of the dataset represents a specific age group of men, either divorced or married or Single. The columns include:

Sr. No.: A serial number to identify each entry. Country: The country of focus. Age Group: The age range of the surveyed individuals. Sex: The gender of the surveyed individuals. Marital Status: The marital status of the individuals, categorized as either "Divorced" or "Married" or "Single". Data Process: The method used to collect the data. Data Collection (Start Year): The year when data collection began. Data Collection (End Year): The year when data collection ended. Data Source: The source of the data. This dataset helps to understand the marital status distribution among different age groups of men and women in all over the world from 1970 to 2019.

Clear search

Close search

Google apps

Main menu

World Marriage Dataset

Geonames - All Cities with a population > 1000

Overcrowding rate by age group - population without single-person households...

World Religions Across Regions

World Religions Across Regions

Analyzing Adherence Across Regions, States and the Global System

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

65 to 74 years poverty in On Top of the World, Florida (2022)

Death Profiles by County

COVID-19 Vaccine Progress Dashboard Data

Household Demographic Surveillance System, Cause-Specific Mortality...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Frequency of data collection

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Overcrowding rate by sex - EU-SILC survey

The GDELT Project

Context

Content

Querying BigQuery tables

Acknowledgements

Audio Visual Speech Dataset: American English

Introduction

Dataset Content

Video Data

Metadata

Data from: LandScan Global, 30 Arc-second Annual Global Gridded Population...

‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2

Data from: ddRAD-seq generated genomic SNP dataset of Central and Southeast...

CDC COVID-19 Vaccine Tracker

CDC COVID-19 Vaccine Tracker

Cumulative and Daily Counts of COVID-19 Vaccine Doses in the United States

About this dataset

How to use the dataset

Research Ideas

COVID Vaccination in World (updated daily)

Context

Content

Acknowledgements

Inspiration

London Heathrow precipitations 2010-2019

Context

Content

Acknowledgements

Inspiration

India Census: Population: Age: 18

World Marriage Dataset

Comprehensive Dataset of Marriage Statistics Worldwide