100+ datasets found
  1. World Bank: GHNP Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: GHNP Data [Dataset]. https://www.kaggle.com/theworldbank/world-bank-health-population
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.

    Update Frequency: Biannual

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics

    https://cloud.google.com/bigquery/public-data/world-bank-hnp

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Citation: The World Bank: Health Nutrition and Population Statistics

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    What’s the average age of first marriages for females around the world?

  2. m

    Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

    • data.mendeley.com
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
    Explore at:
    Dataset updated
    Jul 25, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

    Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.

  3. People Data Labs - Person Dataset

    • datarade.ai
    .json, .csv
    Updated Jul 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    People Data Labs (2020). People Data Labs - Person Dataset [Dataset]. https://datarade.ai/data-products/global-license
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jul 6, 2020
    Dataset provided by
    People Data Labs Inc.
    Authors
    People Data Labs
    Area covered
    Guinea, Kenya, United States of America, Anguilla, Wallis and Futuna, Tunisia, Antarctica, Bosnia and Herzegovina, Afghanistan, Maldives
    Description

    People Data Labs is an aggregator of B2B person and company data. We source our globally compliant person dataset via our "Data Union".

    The "Data Union" is our proprietary data sharing co-op. Customers opt-in to sharing their data and warrant that their data is fully compliant with global data privacy regulations. Some data sources are provided as a one time dump, others are refreshed every time we do a new data build. Our data sources come from a variety of verticals including HR Tech, Real Estate Tech, Identity/Anti-Fraud, Martech, and others. People Data Labs works with customers on compliance based topics. If a customer wishes to ensure anonymity, we work with them to anonymize the data.

    Our person data has over 100 fields including resume data (work history, education), contact information (email, phone), demographic info (name, gender, birth date) and social profile information (linkedin, github, twitter, facebook, etc...).

  4. The ORBIT (Object Recognition for Blind Image Training)-India Dataset

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones (2025). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. http://doi.org/10.5281/zenodo.12608444
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

    Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

    The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

    This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

    REFERENCES:

    1. Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

    2. microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

    3. Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641

  5. BENEFIT-REALISE Legacy Soil Profile Dataset

    • data.moa.gov.et
    html
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ethiopian Institute of Agricultural Research (EIAR) (2023). BENEFIT-REALISE Legacy Soil Profile Dataset [Dataset]. http://doi.org/10.20372/eiar-rdm/HE7KTW
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Ethiopian Institute of Agricultural Research
    Description

    Although soil and agronomy data collection in Ethiopia has begun over 60 years ago, the data are hardly accessible as they are scattered across different organizations, mostly held in the hands of individuals (Ashenafi et al.,2020; Tamene et al.,2022), which makes them vulnerable to permanent loss. Cognizant of the problem, the Coalition of the Willing (CoW) for data sharing and access was created in 2018 with joint support and coordination of the Alliance Bioversity-CIAT and GIZ (https://www.ethioagridata.com/index.html). Mobilizing its members, the CoW has embarked on data rescue operations including data ecosystem mapping, collation, and curation of the legacy data, which was put into the central data repository for its members and the wider data user’s community according to the guideline developed based on the FAIR data principles and approved by the CoW. So far, CoW managed to collate and rescue about 20,000 legacy soil profile data and over 38,000 crop responses to fertilizer data (Tamene et al.,2022). The legacy soil profile dataset (consisting of Profiles Site = 1,776 observations with 37 variables; Profiles Layer Field = 1,493 observations with 64 variables; Profiles Layer Lab= 1,386 observations with 76 variables) is extracted, transformed, and uploaded into a harmonized template (adapted from Batjes 2022; Leenaars et al, 2014) from the below source: Bilateral Ethiopian-Netherlands Effort for Food, Income and Trade (BENEFIT) Partnership which is a portfolio of five programs (ISSD, Cascape, ENTAG, SBN, and REALISE) and is funded by the government of the Kingdom of Netherlands through its embassy in Addis Ababa. The BENEFIT-REALISE program implements its interventions in 60 PSNP weredas in four regions (Tigray, Amhara, Oromia, and SNNPR).Accordingly, in 2019, BENEFIT-REALISE along with the MoA initiated a wereda-wide soil resource characterization and mapping task at1:50,000 scale in 15 BENEFIT-REALISE intervention weredas: 3 of Tigray, 6 of Amhara, 3 of Oromia, and 3 of SNNPR. Reference: Ashenafi, A., Tamene, L., and Erkossa, T. 2020. Identifying, Cataloguing, and Mapping Soil and Agronomic Data in Ethiopia. CIAT Publication No. 506. International Center for Tropical Agriculture (CIAT). Addis Ababa, Ethiopia. 42 p. 10.13140/RG.2.2.31759.41123. Ashenafi, A., Erkossa, T., Gudeta, K., Abera, W., Mesfin, E., Mekete, T., Haile, M., Haile, W., Abegaz, A., Tafesse, D. and Belay, G., 2022. Reference Soil Groups Map of Ethiopia Based on Legacy Data and Machine Learning Technique: EthioSoilGrids 1.0. EGUsphere, pp.1-40. https://doi.org/10.5194/egusphere-2022-301 Tamene L; Erkossa T; Tafesse T; Abera W; Schultz S. 2021. A coalition of the Willing - Powering data-driven solutions for Ethiopian Agriculture. CIAT Publication No. 518. International Center for Tropical Agriculture (CIAT). Addis Ababa, Ethiopia. 34 p. https://www.ethioagridata.com/Resources/Powering%20Data-Driven%20Solutions%20for%20Ethiopian%20Agriculture.pdf. The Coalition of the Willing (CoW) website: https://www.ethioagridata.com/index.html. Batjes, N.H., 2022. Basic principles for compiling a profile dataset for consideration in WoSIS. CoP report, ISRIC–World Soil Information, Wageningen. Contents Summary, 4(1), p.3. Carvalho Ribeiro, E.D. and Batjes, N.H., 2020. World Soil Information Service (WoSIS)-Towards the standardization and harmonization of world soil data: Procedures Manual 2020. Elias, E.: Soils of the Ethiopian Highlands: Geomorphology and Properties, CASCAPE Project, 648 ALTERRA, Wageningen UR, the Netherlands, library.wur.nl/WebQuery/isric/2259099, 649 2016. Leenaars, J. G. B., van Oostrum, A.J.M., and Ruiperez ,G.M.: Africa Soil Profiles Database, Version 1.2. A compilation of georeferenced and standardised legacy soil profile data for Sub Saharan Africa (with dataset), ISRIC Report 2014/01, Africa Soil Information Service (AfSIS) project and ISRIC – World Soil Information, Wageningen, library.wur.nl/WebQuery/isric/2259472, 2014. Leenaars, J. G. B., Eyasu, E., Wösten, H., Ruiperez González, M., Kempen, B.,Ashenafi, A., and Brouwer, F.: Major soil-landscape resources of the cascape intervention woredas, Ethiopia: Soil information in support to scaling up of evidence-based best practices in agricultural production (with dataset), CASCAPE working paper series No. OT_CP_2016_1, Cascape. https://edepot.wur.nl/428596, 2016. Leenaars, J. G. B., Elias, E., Wösten, J. H. M., Ruiperez-González, M., and Kempen, B.: Mapping the major soil-landscape resources of the Ethiopian Highlands using random forest, Geoderma, 361, https://doi.org/10.1016/j.geoderma.2019.114067, 2020a. 740 . Leenaars, J. G. B., Ruiperez, M., González, M., Kempen, B., and Mantel, S.: Semi-detailed soil resource survey and mapping of REALISE woredas in Ethiopia, Project report to the BENEFIT-REALISE programme, December, ISRIC-World Soil Information, Wageningen, 2020b.

    TERMS: Access to the data is limited to the CoW members until the national soil and agronomy data-sharing directive of MoA is registered by the Ministry of Justice and released for implementation. DISCLAIMER: The dataset populated in the harmonized template consisting of 76 variables is extracted, transformed, and uploaded from the source document by the CoW. Hence, if any irregularities are observed, the data users have referred to the source document uploaded along with the dataset. Use of the dataset and any consequences arising from using it is the user’s sole responsibility.

  6. F

    Native American Multi-Year Facial Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Native American Multi-Year Facial Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-native-american
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Native American Multi-Year Facial Image Dataset, thoughtfully curated to support the development of advanced facial recognition systems, biometric identification models, KYC verification tools, and other computer vision applications. This dataset is ideal for training AI models to recognize individuals over time, track facial changes, and enhance age progression capabilities.

    Facial Image Data

    This dataset includes over 5,000+ high-quality facial images, organized into individual participant sets, each containing:

    Historical Images: 22 facial images per participant captured across a span of 10 years
    Enrollment Image: One recent high-resolution facial image for reference or ground truth

    Diversity & Representation

    Geographic Coverage: Participants from USA, Canada, Mexico and more and other Native American regions
    Demographics: Individuals aged 18 to 70 years, with a gender distribution of 60% male and 40% female
    File Formats: All images are available in JPEG and HEIC formats

    Image Quality & Capture Conditions

    To ensure model generalization and practical usability, images in this dataset reflect real-world diversity:

    Lighting Conditions: Images captured under various natural and artificial lighting setups
    Backgrounds: A wide range of indoor and outdoor backgrounds
    Device Quality: Captured using modern, high-resolution mobile devices for consistency and clarity

    Metadata

    Each participant’s dataset is accompanied by rich metadata to support advanced model training and analysis, including:

    Unique participant ID
    File name
    Age at the time of image capture
    Gender
    Country of origin
    Demographic profile
    File format

    Use Cases & Applications

    This dataset is highly valuable for a wide range of AI and computer vision applications:

    Facial Recognition Systems: Train models for high-accuracy face matching across time
    KYC & Identity Verification: Improve time-spanning verification for banks, insurance, and government services
    Biometric Security Solutions: Build reliable identity authentication models
    Age Progression & Estimation Models: Train AI to predict aging patterns or estimate age from facial features
    Generative AI: Support creation and validation of synthetic age progression or longitudinal face generation

    Secure & Ethical Collection

    Platform: All data was securely collected and processed through FutureBeeAI’s proprietary systems
    Ethical Compliance: Full participant consent obtained with transparent communication of use cases
    Privacy-Protected: No personally identifiable information is included; all data is anonymized and handled with care

    Dataset Updates & Customization

    To keep pace with evolving AI needs, this dataset is regularly updated and customizable. Custom data collection options include:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap:

  7. i

    Online Learning Global Queries Dataset: A Comprehensive Dataset of What...

    • ieee-dataport.org
    Updated May 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isabella Hall (2022). Online Learning Global Queries Dataset: A Comprehensive Dataset of What People from Different Countries ask Google about Online Learning [Dataset]. https://ieee-dataport.org/documents/online-learning-global-queries-dataset-comprehensive-dataset-what-people-different
    Explore at:
    Dataset updated
    May 11, 2022
    Authors
    Isabella Hall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Any work using this dataset should cite the following paper:

  8. H

    Replication Data for: Who governs? A new global dataset on members of...

    • dataverse.harvard.edu
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Nyrup (2020). Replication Data for: Who governs? A new global dataset on members of cabinets [Dataset]. http://doi.org/10.7910/DVN/YTRCQE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 5, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jacob Nyrup
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/YTRCQEhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/YTRCQE

    Description

    Replication Data for: Who governs? A new global dataset on members of cabinets.

  9. m

    Cyberbullying dataset for Kurdish Language

    • data.mendeley.com
    Updated Aug 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soran Badawi (2025). Cyberbullying dataset for Kurdish Language [Dataset]. http://doi.org/10.17632/ck49jyxcbt.4
    Explore at:
    Dataset updated
    Aug 5, 2025
    Authors
    Soran Badawi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cyberbullying has become an increasingly prevalent issue in the digital age, with the rise of social media and online communication. It can take many forms, including verbal attacks, harassment, and discrimination, and it can have serious consequences for victims, including depression, anxiety, and even suicide. While much research has been done on cyberbullying in languages such as English, Spanish, and Chinese, there has been little focus on languages spoken by smaller populations, such as Kurdish. Kurdish is a language spoken by millions of people in the Middle East, including Turkey, Iran, Iraq, and Syria. It is an Indo-European language with several dialects, and it is considered an official language in Iraq and an official regional language in Iran. Despite its widespread use, there has been very little research on cyberbullying in Kurdish, and there are currently no datasets available that specifically focus on this issue. To address this gap, we have created the first ever cyberbullying dataset for the Kurdish language. This dataset contains three classes: neutral, racism, and sexism. The neutral class includes messages that do not contain any form of cyberbullying, while the racism and sexism classes include messages that contain discriminatory language based on race or gender, respectively. The dataset was created using a combination of manual and automated techniques. We collected a large number of messages from Twitter API, that were written in Kurdish. We then manually labeled these messages based on whether they contained cyberbullying or not, and further categorized them into the three classes. The resulting dataset contains over 30,000 messages, with roughly equal distribution among the three classes. It is a valuable resource for researchers and practitioners who are interested in studying cyberbullying in the Kurdish language and developing strategies to combat it. The dataset can be used for a variety of purposes, including training machine learning models to detect cyberbullying in Kurdish, analyzing the language used in cyberbullying messages to identify patterns and trends, and developing interventions to prevent and address cyberbullying in Kurdish-speaking communities.

  10. Z

    Global Dataset of Cyber Incidents V.1.2

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Repository of Cyber Incidents (EuRepoC) (2024). Global Dataset of Cyber Incidents V.1.2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7848940
    Explore at:
    Dataset updated
    May 3, 2024
    Dataset authored and provided by
    European Repository of Cyber Incidents (EuRepoC)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains data on 2889 cyber incidents between 01.01.2000 and 02.05.2024 using 60 variables, including the start date, names and categories of receivers along with names and categories of initiators. The database was compiled as part of the European Repository of Cyber Incidents (EuRepoC) project.

    EuRepoC gathers, codes, and analyses publicly available information from over 200 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.For more information on the scope and data collection methodology see: https://eurepoc.eu/methodologyCodebook available hereInformation about each file:

    Global Database (csv or xlsx):This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.

    Receiver Dataset (csv):In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).

    Attribution Dataset (csv):This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.eurepoc_global_database_1.2 (json):This file contains the whole database in JSON format.

  11. Z

    Processed Synthetic Real-World Data for binary modelling

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Dec 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pnevmatikakis, Aristodemos (2022). Processed Synthetic Real-World Data for binary modelling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7410141
    Explore at:
    Dataset updated
    Dec 9, 2022
    Dataset authored and provided by
    Pnevmatikakis, Aristodemos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This model learning dataset is created out of the Raw Synthetic RWD raw dataset, including some of the original attributes. It is distributed in JOBLIB files, where .joblib files contain the vectors and _ids.joblib contain the ID of the person from which each vector is extracted.

    This is useful in case it is needed to map the vectors to metadata about the people that are found in the original raw dataset. Note that corresponds to , or , depending on the dataset. The split is roughly 60% of the people are in the training dataset, and 20% in each of the validation and the testing datasets. The input attributes are the age, the short-term averages and the trends of the current week’s BMI, steps walked, calories burned, sleep quality, mood and water consumption, as well as the previous week’s short-term average and trend of the answer to the health self-assessment question.

    The outcome to be predicted is the binary quantized health self-assessment answer to be given in the current week. The dataset is normalized based on the training set. The means and standard deviations used can be found in the train_statistics.joblib file. Finally, the output_descriptions.joblib file contains descriptions of the outcomes to be predicted (not actually needed, since included here).

  12. N

    White Earth, ND Non-Hispanic Population Breakdown By Race Dataset:...

    • neilsberg.com
    csv, json
    Updated Jul 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). White Earth, ND Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e15b7176-2310-11ef-bd92-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    White Earth, North Dakota
    Variables measured
    Non-Hispanic Asian Population, Non-Hispanic Black Population, Non-Hispanic White Population, Non-Hispanic Some other race Population, Non-Hispanic Two or more races Population, Non-Hispanic American Indian and Alaska Native Population, Non-Hispanic Native Hawaiian and Other Pacific Islander Population, Non-Hispanic Asian Population as Percent of Total Non-Hispanic Population, Non-Hispanic Black Population as Percent of Total Non-Hispanic Population, Non-Hispanic White Population as Percent of Total Non-Hispanic Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Non-Hispanic population and (b) population as a percentage of the total Non-Hispanic population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and are part of Non-Hispanic classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Non-Hispanic population of White Earth by race. It includes the distribution of the Non-Hispanic population of White Earth across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of White Earth across relevant racial categories.

    Key observations

    With a zero Hispanic population, White Earth is 100% Non-Hispanic. Among the Non-Hispanic population, the largest racial group is White alone with a population of 76 (100% of the total Non-Hispanic population).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (for Non-Hispanic) for the White Earth
    • Population: The population of the racial category (for Non-Hispanic) in the White Earth is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of White Earth total Non-Hispanic population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for White Earth Population by Race & Ethnicity. You can refer the same here

  13. G

    GPWv411: Population Density (Gridded Population of the World Version 4.11)

    • developers.google.com
    Updated Aug 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA SEDAC at the Center for International Earth Science Information Network (2019). GPWv411: Population Density (Gridded Population of the World Version 4.11) [Dataset]. http://doi.org/10.7927/H49C6VHW
    Explore at:
    Dataset updated
    Aug 11, 2019
    Dataset provided by
    NASA SEDAC at the Center for International Earth Science Information Network
    Time period covered
    Jan 1, 2000 - Jan 1, 2020
    Area covered
    Earth
    Description

    This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.

  14. World Religion Project - Global Religion Dataset

    • thearda.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Association of Religion Data Archives, World Religion Project - Global Religion Dataset [Dataset]. http://doi.org/10.17605/OSF.IO/J7BCM
    Explore at:
    Dataset provided by
    Association of Religion Data Archives
    Dataset funded by
    The University of California, Davis
    The John Templeton Foundation
    Description

    The World Religion Project (WRP) aims to provide detailed information about religious adherence worldwide since 1945. It contains data about the number of adherents by religion in each of the states in the international system. These numbers are given for every half-decade period (1945, 1950, etc., through 2010). Percentages of the states' populations that practice a given religion are also provided. (Note: These percentages are expressed as decimals, ranging from 0 to 1, where 0 indicates that 0 percent of the population practices a given religion and 1 indicates that 100 percent of the population practices that religion.) Some of the religions (as detailed below) are divided into religious families. To the extent data are available, the breakdown of adherents within a given religion into religious families is also provided.

    The project was developed in three stages. The first stage consisted of the formation of a religion tree. A religion tree is a systematic classification of major religions and of religious families within those major religions. To develop the religion tree we prepared a comprehensive literature review, the aim of which was (i) to define a religion, (ii) to find tangible indicators of a given religion of religious families within a major religion, and (iii) to identify existing efforts at classifying world religions. (Please see the original survey instrument to view the structure of the religion tree.) The second stage consisted of the identification of major data sources of religious adherence and the collection of data from these sources according to the religion tree classification. This created a dataset that included multiple records for some states for a given point in time. It also contained multiple missing data for specific states, specific time periods and specific religions. The third stage consisted of cleaning the data, reconciling discrepancies of information from different sources and imputing data for the missing cases.

    The Global Religion Dataset: This dataset uses a religion-by-five-year unit. It aggregates the number of adherents of a given religion and religious group globally by five-year periods.

  15. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    • tokrwards.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

  16. w

    Dataset of book subjects that contain The unknown Indians : people who...

    • workwithdata.com
    Updated Nov 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain The unknown Indians : people who quietly changed our world [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+unknown+Indians+%3A+people+who+quietly+changed+our+world&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 2 rows and is filtered where the books is The unknown Indians : people who quietly changed our world. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  17. d

    Crash Data

    • catalog.data.gov
    • data.townofcary.org
    • +1more
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cary (2025). Crash Data [Dataset]. https://catalog.data.gov/dataset/crash-data
    Explore at:
    Dataset updated
    Oct 11, 2025
    Dataset provided by
    Cary
    Description

    This dataset contains crash information from the last five years to the current date. The data is based on the National Incident Based Reporting System (NIBRS). The data is dynamic, allowing for additions, deletions and modifications at any time, resulting in more accurate information in the database. Due to ongoing and continuous data entry, the numbers of records in subsequent extractions are subject to change.About Crash DataThe Cary Police Department strives to make crash data as accurate as possible, but there is no avoiding the introduction of errors into this process, which relies on data furnished by many people and that cannot always be verified. As the data is updated on this site there will be instances of adding new incidents and updating existing data with information gathered through the investigative process.Not surprisingly, crash data becomes more accurate over time, as new crashes are reported and more information comes to light during investigations.This dynamic nature of crash data means that content provided here today will probably differ from content provided a week from now. Likewise, content provided on this site will probably differ somewhat from crime statistics published elsewhere by the Town of Cary, even though they draw from the same database.About Crash LocationsCrash locations reflect the approximate locations of the crash. Certain crashes may not appear on maps if there is insufficient detail to establish a specific, mappable location.

  18. Mass Killings in America, 2006 - present

    • data.world
    csv, zip
    Updated Oct 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Oct 10, 2025
    Dataset provided by
    data.world, Inc.
    Authors
    The Associated Press
    Time period covered
    Jan 1, 2006 - Sep 28, 2025
    Area covered
    Description

    THIS DATASET WAS LAST UPDATED AT 2:13 PM EASTERN ON OCT. 10

    OVERVIEW

    2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

    In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

    A total of 229 people died in mass killings in 2019.

    The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

    One-third of the offenders died at the scene of the killing or soon after, half from suicides.

    About this Dataset

    The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

    The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

    This data will be updated periodically and can be used as an ongoing resource to help cover these events.

    Using this Dataset

    To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

    Mass killings by year

    Mass shootings by year

    To get these counts just for your state:

    Filter killings by state

    Definition of "mass murder"

    Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

    This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

    Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

    Methodology

    Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

    Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

    In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

    Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

    Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

    This project started at USA TODAY in 2012.

    Contacts

    Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.

  19. Z

    Processed Synthetic Real-World Data for tristate modelling

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Dec 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pnevmatikakis, Aristodemos (2022). Processed Synthetic Real-World Data for tristate modelling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7410183
    Explore at:
    Dataset updated
    Dec 9, 2022
    Dataset authored and provided by
    Pnevmatikakis, Aristodemos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This model learning dataset is created out of the Raw Synthetic RWD raw dataset, including some of the original attributes. It is distributed in JOBLIB files, where .joblib files contain the vectors and _ids.joblib contain the ID of the person from which each vector is extracted.

    This is useful in case it is needed to map the vectors to metadata about the people that are found in the original raw dataset. Note that corresponds to , or , depending on the dataset.

    The split is roughly 60% of the people are in the training dataset, and 20% in each of the validation and the testing datasets. The input attributes are the age, the short-term averages and the trends of the current week’s BMI, steps walked, calories burned, sleep quality, mood and water consumption, as well as the previous week’s short-term average and trend of the answer to the health self-assessment question.

    The outcome to be predicted is a tristate quantized version of the health self-assessment answer to be given in the current week. The dataset is normalized based on the training set. The means and standard deviations used can be found in the train_statistics.joblib file. Finally, the output_descriptions.joblib file contains descriptions of the outcomes to be predicted (not actually needed, since included here).

  20. w

    Global Financial Inclusion (Global Findex) Database 2011 - Japan

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Apr 15, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2015). Global Financial Inclusion (Global Findex) Database 2011 - Japan [Dataset]. https://microdata.worldbank.org/index.php/catalog/1189
    Explore at:
    Dataset updated
    Apr 15, 2015
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2011
    Area covered
    Japan
    Description

    Abstract

    Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

    The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

    Geographic coverage

    National Coverage.

    Analysis unit

    Individual

    Universe

    The target population is the civilian, non-institutionalized population 15 years and above. The sample is nationally representative.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

    Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

    Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

    The sample size in Japan was 1,000 individuals.

    Mode of data collection

    Landline telephone

    Research instrument

    The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

    Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
World Bank (2019). World Bank: GHNP Data [Dataset]. https://www.kaggle.com/theworldbank/world-bank-health-population
Organization logo

World Bank: GHNP Data

World Bank: Global Health, Nutrition, and Population Data (BigQuery Dataset)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
Authors
World Bank
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.

Update Frequency: Biannual

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics

https://cloud.google.com/bigquery/public-data/world-bank-hnp

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Citation: The World Bank: Health Nutrition and Population Statistics

Banner Photo by @till_indeman from Unplash.

Inspiration

What’s the average age of first marriages for females around the world?

Search
Clear search
Close search
Google apps
Main menu