18 datasets found
  1. World Marriage Dataset

    • kaggle.com
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrar Hussain (2024). World Marriage Dataset [Dataset]. https://www.kaggle.com/datasets/dataanalyst001/world-marriage-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ibrar Hussain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This World Marriage Dataset provides a comparable and up-to-date set of data on the marital status of the population by age and sex for 232 countries or different regions of the world from 1970 to 2019. There are 271605 rows and 9 columns in this dataset. Each row of the dataset represents a specific age group of men, either divorced or married or Single. The columns include:

    Sr. No.: A serial number to identify each entry. Country: The country of focus. Age Group: The age range of the surveyed individuals. Sex: The gender of the surveyed individuals. Marital Status: The marital status of the individuals, categorized as either "Divorced" or "Married" or "Single". Data Process: The method used to collect the data. Data Collection (Start Year): The year when data collection began. Data Collection (End Year): The year when data collection ended. Data Source: The source of the data. This dataset helps to understand the marital status distribution among different age groups of men and women in all over the world from 1970 to 2019.

  2. o

    Geonames - All Cities with a population > 1000

    • public.opendatasoft.com
    • data.smartidf.services
    • +2more
    csv, excel, geojson +1
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
    Explore at:
    csv, json, geojson, excelAvailable download formats
    Dataset updated
    Mar 10, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

  3. t

    Overcrowding rate by age group - population without single-person households...

    • service.tib.eu
    • db.nomics.world
    • +1more
    Updated Jan 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Overcrowding rate by age group - population without single-person households - EU-SILC survey [Dataset]. https://service.tib.eu/ldmservice/dataset/eurostat_ub4yxjwglcni8bf3erimq
    Explore at:
    Dataset updated
    Jan 8, 2025
    Description

    This indicator is defined as the percentage of the population living in an overcrowded household (excluding the single-person households). A person is considered as living in an overcrowded household if the household does not have at its disposal a minimum of rooms equal to: - one room for the household; - one room by couple in the household; - one room for each single person aged 18 and more; - one room by pair of single people of the same sex between 12 and 17 years of age; - one room for each single person between 12 and 17 years of age and not included in the previous category; - one room by pair of children under 12 years of age. The indicator is presented by age group.

  4. World Religions Across Regions

    • kaggle.com
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). World Religions Across Regions [Dataset]. https://www.kaggle.com/datasets/thedevastator/a-global-perspective-on-world-religions-1945-201
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Area covered
    World
    Description

    World Religions Across Regions

    Analyzing Adherence Across Regions, States and the Global System

    By Correlates of War Project [source]

    About this dataset

    The World Religion Project (WRP) is an ambitious endeavor to conduct a comprehensive analysis of religious adherence throughout the world from 1945 to 2010. This cutting-edge project offers unparalleled insight into the religious behavior of people in different countries, regions, and continents during this time period. Its datasets provide important information about the numbers and percentages of adherents across a multitude of different religions, religion families, and non-religious affiliations.

    The WRP consists of three distinct datasets: the national religion dataset, regional religion dataset, and global religion dataset. Each is focused on understanding individually specific realms for varied analysis approaches - from individual states to global systems. The national dataset provides data on number of adherents by state as well as percentage population practicing a given faith group in five-year increments; focusing attention to how this number evolves from nation to nation over time. Similarly, regional data is provided at five year intervals highlighting individual region designations with one modification – Pacific Ocean states have been reclassified into their own Oceania category according to Country Code Number 900 or above). Finally at a global level – all states are aggregated in order that we may understand a snapshot view at any five-year interval between 1945‐2010 regarding relationships between religions or religio‐families within one location or transnationally.

    This project was developed in three stages: firstly forming a religions tree (a systematic classification), secondly collecting data such as this provided by WRP according to that classification structure – lastly cleaning the data so discrepancies may be reconciled and imported where needed with gaps selected when unknown values were encountered during collection process . We would encourage anyone wishing details undergoing more detailed reading/analysis relating various use applications for these rich datasets - please contact Zeev Maoz (University California Davis) & Errol A Henderson _(Pennsylvania State University)

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    The World Religions Project (WRP) dataset offers a comprehensive look at religious adherence around the world within a single dataset. With this dataset, you can track global religious trends over a period of 65 years and explore how they’ve changed during that time. By exploring the WRP data set, you’ll gain insight into cross-regional and cross-time patterns in religious affiliation around the world.

    Research Ideas

    • Analyzing historical patterns of religious growth and decline across different regions
    • Creating visualizations to compare religious adherence in various states, countries, or globally
    • Studying the impact of governmental policies on religious participation over time

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: WRP regional data.csv | Column name | Description | |:-----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------| | Year | Reference year for data collection. (Integer) | | Region | World region according to Correlates Of War (COW) Regional Systemizations with one modification (Oceania category for COW country code ...

  5. w

    65 to 74 years poverty in On Top of the World, Florida (2022)

    • welfareinfo.org
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WelfareInfo.org (2024). 65 to 74 years poverty in On Top of the World, Florida (2022) [Dataset]. https://www.welfareinfo.org/poverty-rate/florida/on-top-of-the-world/stat-single-people-65-74-years-old/
    Explore at:
    Dataset updated
    Sep 12, 2024
    Dataset provided by
    WelfareInfo.org
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Florida, On Top of the World
    Description

    65 to 74 years Poverty Rate Statistics for 2022. This is part of a larger dataset covering poverty in On Top of the World, Florida by age, education, race, gender, work experience and more.

  6. C

    Death Profiles by County

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv, zip
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county
    Explore at:
    csv(24235858), csv(21575405), csv(11738570), csv(15127221), csv(60676655), csv(1128641), csv(60023260), csv(28125832), csv(75015194), csv(74043128), csv(74351424), csv(74497014), csv(60201673), csv(74689382), csv(73906266), csv(60517511), csv(52019564), zip, csv(5095)Available download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    California Department of Public Health
    Description

    This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

    The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

    The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.

  7. COVID-19 Vaccine Progress Dashboard Data

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv, xlsx, zip
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data [Dataset]. https://data.chhs.ca.gov/dataset/vaccine-progress-dashboard
    Explore at:
    csv(188895), csv(26828), csv(111682), csv(82754), csv(54906), csv(638738), csv(110928434), csv(303068812), xlsx(11731), csv(503270), csv(2447143), csv(2641927), zip, xlsx(11534), csv(6772350), xlsx(11249), csv(148732), csv(675610), csv(724860)Available download formats
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

    On 6/16/2023 CDPH replaced the booster measures with a new “Up to Date” measure based on CDC’s new recommendations, replacing the primary series, boosted, and bivalent booster metrics The definition of “primary series complete” has not changed and is based on previous recommendations that CDC has since simplified. A person cannot complete their primary series with a single dose of an updated vaccine. Whereas the booster measures were calculated using the eligible population as the denominator, the new up to date measure uses the total estimated population. Please note that the rates for some groups may change since the up to date measure is calculated differently than the previous booster and bivalent measures.

    This data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/ which summarizes vaccination data at the county level by county of residence. Where county of residence was not reported in a vaccination record, the county of provider that vaccinated the resident is included. This applies to less than 1% of vaccination records. The sum of county-level vaccinations does not equal statewide total vaccinations due to out-of-state residents vaccinated in California.

    These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

    Totals for the Vaccine Progress Dashboard and this dataset may not match, as the Dashboard totals doses by Report Date and this dataset totals doses by Administration Date. Dose numbers may also change for a particular Administration Date as data is updated.

    Previous updates:

    • On March 3, 2023, with the release of HPI 3.0 in 2022, the previous equity scores have been updated to reflect more recent community survey information. This change represents an improvement to the way CDPH monitors health equity by using the latest and most accurate community data available. The HPI uses a collection of data sources and indicators to calculate a measure of community conditions ranging from the most to the least healthy based on economic, housing, and environmental measures.

    • Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 16+ and age 5+ denominators have been uploaded as archived tables.

    • Starting on May 29, 2021 the methodology for calculating on-hand inventory in the shipped/delivered/on-hand dataset has changed. Please see the accompanying data dictionary for details. In addition, this dataset is now down to the ZIP code level.

  8. i

    Household Demographic Surveillance System, Cause-Specific Mortality...

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wasif A. Khan (2019). Household Demographic Surveillance System, Cause-Specific Mortality 1992-2012 - World [Dataset]. https://catalog.ihsn.org/catalog/5541
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    Shashi Kant
    Berhe Weldearegawi
    Stephen M. Tollman
    Abdramane Soura
    Wasif A. Khan
    Margaret Gyapong
    Siswanto Wilopo
    Abraham J. Herbst
    P. Kim Streatfield
    Ali Sie
    Frank O. Odhiambo
    Amelia Crampin
    Nurul Alam
    Peter Byass
    Bassirou Bonfoh
    Valérie Delaunay
    Abraham Oduro
    Marcel Tanner
    Thomas N. Williams
    Osman A. Sankoh
    Momodou Jasseh
    Nguyen T.K. Chuc
    Alex Ezeh
    Abba Bhuiya
    Sanjay Juvekar
    Time period covered
    1992 - 2012
    Area covered
    World, World
    Description

    Abstract

    Cause of death data based on VA interviews were contributed by fourteen INDEPTH HDSS sites in sub-Saharan Africa and eight sites in Asia. The principles of the Network and its constituent population surveillance sites have been described elsewhere [1]. Each HDSS site is committed to long-term longitudinal surveillance of circumscribed populations, typically each covering around 50,000 to 100,000 people. Households are registered and visited regularly by lay field-workers, with a frequency varying from once per year to several times per year. All vital events are registered at each such visit, and any deaths recorded are followed up with verbal autopsy interviews, usually 147 undertaken by specially trained lay interviewers. A few sites were already operational in the 1990s, but in this dataset 95% of the person-time observed related to the period from 2000 onwards, with 58% from 2007 onwards. Two sites, in Nairobi and Ouagadougou, followed urban populations, while the remainder covered areas that were generally more rural in character, although some included local urban centres. Sites covered entire populations, although the Karonga, Malawi, site only contributed VAs for deaths of people aged 12 years and older. Because the sites were not located or designed in a systematic way to be representative of national or regional populations, it is not meaningful to aggregate results over sites.

    All cause of death assignments in this dataset were made using the InterVA-4 model version 4.02 [2]. InterVA-4 uses probabilistic modelling to arrive at likely cause(s) of death for each VA case, the workings of the model being based on a combination of expert medical opinion and relevant available data. InterVA-4 is the only model currently available that processes VA data according to the WHO 2012 standard and categorises causes of death according to ICD-10. Since the VA data reported here were collected before the WHO 2012 standard was formulated, they were all retrospectively transformed into the WHO 2012 and InterVA-4 input format for processing.

    The InterVA-4 model was applied to the data from each site, yielding, for each case, up to three possible causes of death or an indeterminate result. Each cause for a case is a single record in the dataset. In a minority of cases, for example where symptoms were vague, contradictory or mutually inconsistent, it was impossible for InterVA-4 to determine a cause of death, and these deaths were attributed as entirely indeterminate. For the remaining cases, one to three likely causes and their likelihoods were assigned by InterVA-4, and if the sum of their likelihoods was less than one, the residual component was then assigned as being indeterminate. This was an important process for capturing uncertainty in cause of death outcome(s) from the model at the individual level, thus avoiding over-interpretation of specific causes. As a consequence there were three sources of unattributed cause of death: deaths registered for which VAs were not successfully completed; VAs completed but where the cause was entirely indeterminate; and residual components of deaths attributed as indeterminate.

    In this dataset each case has between one and four records, each with its own cause and likelihood. Cases for which VAs were not successfully completed has a single record with the cause of death recorded as “VA not completed” and a likelihood of one. Thus the overall sum of the likelihoods equated to the total number of deaths. Each record also contains a population weighting factor reflecting the ratio of the population fraction for its site, age group, sex and year to the corresponding age group and sex fraction in the standard population (see section on weighting).

    In this context, all of these data are secondary datasets derived from primary data collected separately by each participating site. In all cases the primary data collection was covered by site-level ethical approvals relating to on-going demographic surveillance in those specific locations. No individual identity or household location data are included in this secondary data.

    1. Sankoh O, Byass P. The INDEPTH Network: filling vital gaps in global epidemiology. International Journal of Epidemiology 2012; 41:579-588.

    2. Byass P, Chandramohan D, Clark SJ, D’Ambruoso L, Fottrell E, Graham WJ, et al. Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool. Global Health Action 2012; 5:19281.

    Geographic coverage

    Demographic surveiallance areas (countries from Africa, Asia and Oceania) of the following HDSSs:

    Code  Country    INDEPTH Centre
    BD011 Bangladesh ICDDR-B : Matlab
    BD012 Bangladesh ICDDR-B : Bandarban
    BD013 Bangladesh ICDDR-B : Chakaria
    BD014 Bangladesh ICDDR-B : AMK BF031 Burkina Faso Nouna BF041 Burkina Faso Ouagadougou
    CI011 Côte d'Ivoire Taabo ET031 Ethiopia Kilite Awlaelo
    GH011 Ghana Navrongo
    GH031 Ghana Dodowa
    GM011 The Gambia Farafenni ID011 Indonesia Purworejo IN011 India Ballabgarh
    IN021 India Vadu
    KE011 Kenya Kilifi
    KE021 Kenya Kisumu
    KE031 Kenya Nairobi
    MW011 Malawi Karonga
    SN011 Senegal IRD : Bandafassi VN012 Vietnam Hanoi Medical University : Filabavi
    ZA011 South Africa Agincourt ZA031 South Africa Africa Centre

    Analysis unit

    Death Cause

    Universe

    Surveillance population Deceased individuals Cause of death

    Kind of data

    Verbal autopsy-based cause of death data

    Frequency of data collection

    Rounds per year varies between sites from once to three times per year

    Sampling procedure

    No sampling, covers total population in demographic surveillance area

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The Verbal Autopsy Questionnaires used by the various sites differed, but in most cases they were a derivation from the original WHO Verbal Autopsy questionnaire.

    http://www.who.int/healthinfo/statistics/verbalautopsystandards/en/index1.html

    Cleaning operations

    One cause of death record was inserted for every death where a verbal autopsy was not conducted. The cuase of death assigned in these cases is "XX VA not completed"

  9. t

    Overcrowding rate by sex - EU-SILC survey

    • service.tib.eu
    • opendata.marche.camcom.it
    • +1more
    Updated Jan 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Overcrowding rate by sex - EU-SILC survey [Dataset]. https://service.tib.eu/ldmservice/dataset/eurostat_73gaskdkzvpaxxuj7oqg
    Explore at:
    Dataset updated
    Jan 8, 2025
    Description

    This indicator is defined as the percentage of the population living in an overcrowded household. A person is considered as living in an overcrowded household if the household does not have at its disposal a minimum of rooms equal to: - one room for the household; - one room by couple in the household; - one room for each single person aged 18 and more; - one room by pair of single people of the same sex between 12 and 17 years of age; - one room for each single person between 12 and 17 years of age and not included in the previous category; - one room by pair of children under 12 years of age. The indicator is presented by sex.

  10. The GDELT Project

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/datasets/gdelt/gdelt
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset authored and provided by
    The GDELT Project
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

    Content

    GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

    You may find these code books helpful:
    GDELT Global Knowledge Graph Codebook V2.1 (PDF)
    GDELT Event Codebook V2.0 (PDF)

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

    Acknowledgements

    You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).

  11. F

    Audio Visual Speech Dataset: American English

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

    Dataset Content

    This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

    Participant Diversity:
    Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.
    Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Video Data

    While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

    Recording Details:
    File Duration: Average duration of 30 seconds to 3 minutes per video.
    Formats: Videos are available in MP4 or MOV format.
    Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.
    Device: Both the latest Android and iOS devices are used in this collection.
    Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:
    Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.
    Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.
    Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.
    Face Orientation: Contains straight face and tilted face angles.
    Participant Positions: Records participants in both standing and seated positions.
    Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.
    Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.
    Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.
    Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:
    Happy
    Sad
    Excited
    Angry
    Annoyed
    Normal
    Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

    Metadata

    The dataset provides comprehensive metadata for each video recording and participant:

  12. f

    Data from: LandScan Global, 30 Arc-second Annual Global Gridded Population...

    • springernature.figshare.com
    zip
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Viswadeep Lebakula; Kelly Sims; Andrew Reith; Amy Rose; Jacob McKee; Phil Coleman; Jason C. Kaufman; Marie Urban; warren christopher jochem; Carrie Whitlock; Mitchell Ogden; Joe Pyle; Darrell Roddy; Justin Epting; Edward A. Bright (2025). LandScan Global, 30 Arc-second Annual Global Gridded Population Datasets from 2000 to 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.28439699.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    figshare
    Authors
    Viswadeep Lebakula; Kelly Sims; Andrew Reith; Amy Rose; Jacob McKee; Phil Coleman; Jason C. Kaufman; Marie Urban; warren christopher jochem; Carrie Whitlock; Mitchell Ogden; Joe Pyle; Darrell Roddy; Justin Epting; Edward A. Bright
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Using an innovative approach that combines geospatial science, remote sensing technology, and machine learning algorithms, LandScan Global is a global population distribution data, at 30 arc seconds (roughly 1km at equator), representing an ambient (24 hour average) population. The LandScan Global algorithm, an R&D 100 Award Winner, uses spatial data, high-resolution imagery exploitation, and a multi-variable dasymetric modeling approach to disaggregate census counts within an administrative boundary. Since no single population distribution model can account for the differences in spatial data availability, quality, scale, and accuracy as well as the differences in cultural settlement practices, LandScan population distribution models are tailored to match the data conditions and geographical nature of each individual country and region. By modeling an ambient population, LandScan Global captures the full potential activity space of people throughout the course of the day and night rather than just a residential location.

  13. A

    ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-top-data-breaches-2004-2021-e7ac/746cf4e2/?iid=002-610&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    This is a dataset containing all the major data breaches in the world from 2004 to 2021

    As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

    This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

    Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

    Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

    --- Original source retains full ownership of the source dataset ---

  14. Data from: ddRAD-seq generated genomic SNP dataset of Central and Southeast...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Botond Lados; Botond Lados; Klára Cseke; Klára Cseke; Attila Benke; Attila Benke; Zoltán Attila Köbölkuti; Zoltán Attila Köbölkuti; Csilla Éva Molnár; Csilla Éva Molnár; László Nagy; László Nagy; Norbert Móricz; Norbert Móricz; Tamás Márton Németh; Tamás Márton Németh; Attila Borovics; Attila Borovics; Ilona Mészáros; Ilona Mészáros; Endre Gy. Tóth; Endre Gy. Tóth (2024). ddRAD-seq generated genomic SNP dataset of Central and Southeast European Turkey oak (Quercus cerris L.) populations [Dataset]. http://doi.org/10.5281/zenodo.7568727
    Explore at:
    Dataset updated
    Feb 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Botond Lados; Botond Lados; Klára Cseke; Klára Cseke; Attila Benke; Attila Benke; Zoltán Attila Köbölkuti; Zoltán Attila Köbölkuti; Csilla Éva Molnár; Csilla Éva Molnár; László Nagy; László Nagy; Norbert Móricz; Norbert Móricz; Tamás Márton Németh; Tamás Márton Németh; Attila Borovics; Attila Borovics; Ilona Mészáros; Ilona Mészáros; Endre Gy. Tóth; Endre Gy. Tóth
    Description

    Turkey oak (Quercus cerris L.) is one of the ecologically and economically most important deciduous tree species in the Central and Southeast European regions. The species distribution range covers hundreds of thousands of hectares throughout the Apennine and Balkan Peninsula, the Carpathian Basin to Asia Minor. Turkey oak has long been known exhibit high levels of genetic and phenotypic variation. Recent predictions on climate responses of this species suggest a significant extension of its distribution in Europe under climate change. Since Turkey oak has relative drought-tolerant behavior, it is regarded as a potential alternative for other forest tree species during forestry climate adaptation efforts, not only in its native regions but in Western Europe as well. For this reason, the survey of existing genetic variability, genetic resources and adaptability of this species has great importance. Next-generation sequencing approaches, such as ddRAD-seq (Double digest restriction-site associated DNA sequencing), allow for obtaining high-resolution genome-wide simple nucleotide polymorphisms (SNPs). Based on thousands of SNP markers the genetic structure of populations and the genetic background of adaptation processes can be studied in far more depth than ever before. In this study, we provide highly variable genome-wide SNP data belonging to Turkey oak for the first time. This dataset comprises the SNP data of 88 individuals of eight populations, two from Bulgaria, one from Kosovo and five from Hungary, respectively. The high-resolution genome-wide markers are suitable to infer genetic diversity, differentiation, population structure and to investigate selection and local adaptation. The dataset accessible at: https://doi.org/10.5281/zenodo.7568727

  15. CDC COVID-19 Vaccine Tracker

    • kaggle.com
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). CDC COVID-19 Vaccine Tracker [Dataset]. https://www.kaggle.com/datasets/thedevastator/cdc-covid-19-vaccine-tracker
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    CDC COVID-19 Vaccine Tracker

    Cumulative and Daily Counts of COVID-19 Vaccine Doses in the United States

    By Nicky Forster [source]

    About this dataset

    The dataset contains data points such as the cumulative count of people who have received at least one dose of the vaccine, new doses administered on a specific date, cumulative count of doses distributed in the country, percentage of population that has completed the full vaccine series, cumulative count of Pfizer and Moderna vaccine doses administered in each state, seven-day rolling averages for new doses administered and distributed, among others.

    It also provides insights into the vaccination status at both national and state levels. The dataset includes information on the percentage of population that has received at least one dose of the vaccine, percentage of population that has completed the full vaccine series, cumulative counts per 100k population for both distributed and administered doses.

    Additionally, it presents data specific to each state, including their abbreviation and name. It outlines details such as cumulative counts per 100k population for both distributed and administered doses in each state. Furthermore, it indicates if there were instances where corrections resulted in single-day negative counts.

    The dataset is compiled from daily snapshots obtained from CDC's COVID Data Tracker. Please note that there may be reporting delays by healthcare providers up to 72 hours after administering a dose.

    This comprehensive dataset serves various purposes including tracking vaccination progress over time across different locations within the United States. It can be used by researchers, policymakers or anyone interested in analyzing trends related to COVID-19 vaccination efforts at both national and state levels

    How to use the dataset

    • Familiarize Yourself with the Columns: Take a look at the available columns in this dataset to understand what information is included. These columns provide details such as state abbreviations, state names, dates of data snapshots, cumulative counts of doses distributed and administered, people who have received at least one dose or completed the vaccine series, percentages of population coverage, manufacturer-specific data, and seven-day rolling averages.

    • Explore Cumulative Counts: The dataset includes cumulative counts that show the total number of doses distributed or administered over time. You can analyze these numbers to track trends in vaccination progress in different states or regions.

    • Analyze Daily Counts: The dataset also provides daily counts of new vaccine doses distributed and administered on specific dates. By examining these numbers, you can gain insights into vaccination rates on a day-to-day basis.

    • Study Population Coverage Metrics: Metrics such as pct_population_received_at_least_one_dose and pct_population_series_complete give you an understanding of how much of each state's population has received at least one dose or completed their vaccine series respectively.

    • Utilize Manufacturer Data: The columns related to Pfizer and Moderna provide information about the number of doses administered for each manufacturer separately. By analyzing this data, you can compare vaccination rates between different vaccines.

    • Consider Rolling Averages: The seven-day rolling average columns allow you to smooth out fluctuations in daily counts by calculating an average over a week's time window. This can help identify long-term trends more accurately.

    • Compare States: You can compare vaccination progress between different states by filtering the dataset based on state names or abbreviations. This way, you can observe variations in distribution and administration rates among different regions.

    • Visualize the Data: Creating charts and graphs will help you visualize the data more effectively. Plotting trends over time or comparing different metrics for various states can provide powerful visual representations of vaccination progress.

    • Stay Informed: Keep in mind that this dataset is continuously updated as new data becomes available. Make sure to check for any updates or refreshed datasets to obtain the most recent information on COVID-19 vaccine distributions and administrations

    Research Ideas

    • Vaccination Analysis: This dataset can be used to analyze the progress of COVID-19 vaccinations in the United States. By examining the cumulative counts of doses distributed and administered, as well as the number of people who have received at least one dose or completed the vaccine series, researchers and policymakers can assess how effectively vaccines are being rolled out and monitor...
  16. COVID Vaccination in World (updated daily)

    • kaggle.com
    zip
    Updated Jun 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishav Sharma (2021). COVID Vaccination in World (updated daily) [Dataset]. https://www.kaggle.com/rsrishav/covid-vaccination-dataset
    Explore at:
    zip(544681 bytes)Available download formats
    Dataset updated
    Jun 21, 2021
    Authors
    Rishav Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    Context

    The data is collected from OWID (Our World in Data) GitHub repository, which is updated on daily bases.

    Content

    This dataset contains only one file vaccinations.csv, which contains the records of vaccination doses received by people from all the countries. * location: name of the country (or region within a country). * iso_code: ISO 3166-1 alpha-3 – three-letter country codes. * date: date of the observation. * total_vaccinations: total number of doses administered. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses). If a person receives one dose of the vaccine, this metric goes up by 1. If they receive a second dose, it goes up by 1 again. * total_vaccinations_per_hundred: total_vaccinations per 100 people in the total population of the country. * daily_vaccinations_raw: daily change in the total number of doses administered. It is only calculated for consecutive days. This is a raw measure provided for data checks and transparency, but we strongly recommend that any analysis on daily vaccination rates be conducted using daily_vaccinations instead. * daily_vaccinations: new doses administered per day (7-day smoothed). For countries that don't report data on a daily basis, we assume that doses changed equally on a daily basis over any periods in which no data was reported. This produces a complete series of daily figures, which is then averaged over a rolling 7-day window. An example of how we perform this calculation can be found here. * daily_vaccinations_per_million: daily_vaccinations per 1,000,000 people in the total population of the country. * people_vaccinated: total number of people who received at least one vaccine dose. If a person receives the first dose of a 2-dose vaccine, this metric goes up by 1. If they receive the second dose, the metric stays the same. * people_vaccinated_per_hundred: people_vaccinated per 100 people in the total population of the country. * people_fully_vaccinated: total number of people who received all doses prescribed by the vaccination protocol. If a person receives the first dose of a 2-dose vaccine, this metric stays the same. If they receive the second dose, the metric goes up by 1. * people_fully_vaccinated_per_hundred: people_fully_vaccinated per 100 people in the total population of the country.

    Note: for people_vaccinated and people_fully_vaccinated we are dependent on the necessary data being made available, so we may not be able to make these metrics available for some countries.

    Acknowledgements

    This data collected by Our World in Data which gets updated daily on their Github.

    Inspiration

    Possible uses for this dataset could include: - Sentiment analysis in a variety of forms - Statistical analysis over time .

  17. London Heathrow precipitations 2010-2019

    • kaggle.com
    Updated Feb 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emanuele Fumagalli (2020). London Heathrow precipitations 2010-2019 [Dataset]. https://www.kaggle.com/datasets/emafuma/ncei-heathrow-2010-2019/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Emanuele Fumagalli
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The idea is to have a very simple time series dataset to be used for experiments with easy but effective visualizations on actual data. It is amazing how much a single graph can comunicate syntehetically a lot of information.

    Content

    The dataset was downloaded from the National Centers for Environmental Information (NCEI), the data is in the public domain and can be used freely. If interested in generating a similar dataset from another station you can start from the Search Tool select Daily Summaries, the time range of interest, search for Cities and in the Search Term put the city you're looking for. When selected you need to add to Cart like an order but there is no charge for ordering data from Climate Data Online as explained in their FAQs.

    Acknowledgements

    Thanks to National Centers for Environmental Information for collecting and making available for free meteorological data from many stations all over the world. In case using the same dataset or generating a new one from NCEI you need to cite the origin.

    Inspiration

    Mostly to see how many different effective visualizations can be generated from a very simple dataset.

  18. India Census: Population: Age: 18

    • ceicdata.com
    Updated Nov 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2019). India Census: Population: Age: 18 [Dataset]. https://www.ceicdata.com/en/india/census-population-by-single-age/census-population-age-18
    Explore at:
    Dataset updated
    Nov 15, 2019
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 1991 - Mar 1, 2011
    Area covered
    India
    Variables measured
    Population
    Description

    India Census: Population: Age: 18 data was reported at 27,958,147.000 Person in 2011. This records an increase from the previous number of 27,686,902.000 Person for 2001. India Census: Population: Age: 18 data is updated yearly, averaging 27,686,902.000 Person from Mar 1991 (Median) to 2011, with 3 observations. The data reached an all-time high of 27,958,147.000 Person in 2011 and a record low of 23,656,856.000 Person in 1991. India Census: Population: Age: 18 data remains active status in CEIC and is reported by Census of India. The data is categorized under India Premium Database’s Demographic – Table IN.GAD002: Census: Population: by Single Age.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ibrar Hussain (2024). World Marriage Dataset [Dataset]. https://www.kaggle.com/datasets/dataanalyst001/world-marriage-dataset
Organization logo

World Marriage Dataset

Comprehensive Dataset of Marriage Statistics Worldwide

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ibrar Hussain
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This World Marriage Dataset provides a comparable and up-to-date set of data on the marital status of the population by age and sex for 232 countries or different regions of the world from 1970 to 2019. There are 271605 rows and 9 columns in this dataset. Each row of the dataset represents a specific age group of men, either divorced or married or Single. The columns include:

Sr. No.: A serial number to identify each entry. Country: The country of focus. Age Group: The age range of the surveyed individuals. Sex: The gender of the surveyed individuals. Marital Status: The marital status of the individuals, categorized as either "Divorced" or "Married" or "Single". Data Process: The method used to collect the data. Data Collection (Start Year): The year when data collection began. Data Collection (End Year): The year when data collection ended. Data Source: The source of the data. This dataset helps to understand the marital status distribution among different age groups of men and women in all over the world from 1970 to 2019.

Search
Clear search
Close search
Google apps
Main menu