https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank
This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.
Update Frequency: Biannual
For more information, see the World Bank website.
Fork this kernel to get started with this dataset.
https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics
https://cloud.google.com/bigquery/public-data/world-bank-hnp
Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Citation: The World Bank: Health Nutrition and Population Statistics
Banner Photo by @till_indeman from Unplash.
What’s the average age of first marriages for females around the world?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2
Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)
The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
People Data Labs is an aggregator of B2B person and company data. We source our globally compliant person dataset via our "Data Union".
The "Data Union" is our proprietary data sharing co-op. Customers opt-in to sharing their data and warrant that their data is fully compliant with global data privacy regulations. Some data sources are provided as a one time dump, others are refreshed every time we do a new data build. Our data sources come from a variety of verticals including HR Tech, Real Estate Tech, Identity/Anti-Fraud, Martech, and others. People Data Labs works with customers on compliance based topics. If a customer wishes to ensure anonymity, we work with them to anonymize the data.
Our person data has over 100 fields including resume data (work history, education), contact information (email, phone), demographic info (name, gender, birth date) and social profile information (linkedin, github, twitter, facebook, etc...).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.
Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.
The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.
This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.
REFERENCES:
Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597
microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset
Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
Although soil and agronomy data collection in Ethiopia has begun over 60 years ago, the data are hardly accessible as they are scattered across different organizations, mostly held in the hands of individuals (Ashenafi et al.,2020; Tamene et al.,2022), which makes them vulnerable to permanent loss. Cognizant of the problem, the Coalition of the Willing (CoW) for data sharing and access was created in 2018 with joint support and coordination of the Alliance Bioversity-CIAT and GIZ (https://www.ethioagridata.com/index.html). Mobilizing its members, the CoW has embarked on data rescue operations including data ecosystem mapping, collation, and curation of the legacy data, which was put into the central data repository for its members and the wider data user’s community according to the guideline developed based on the FAIR data principles and approved by the CoW. So far, CoW managed to collate and rescue about 20,000 legacy soil profile data and over 38,000 crop responses to fertilizer data (Tamene et al.,2022). The legacy soil profile dataset (consisting of Profiles Site = 1,776 observations with 37 variables; Profiles Layer Field = 1,493 observations with 64 variables; Profiles Layer Lab= 1,386 observations with 76 variables) is extracted, transformed, and uploaded into a harmonized template (adapted from Batjes 2022; Leenaars et al, 2014) from the below source: Bilateral Ethiopian-Netherlands Effort for Food, Income and Trade (BENEFIT) Partnership which is a portfolio of five programs (ISSD, Cascape, ENTAG, SBN, and REALISE) and is funded by the government of the Kingdom of Netherlands through its embassy in Addis Ababa. The BENEFIT-REALISE program implements its interventions in 60 PSNP weredas in four regions (Tigray, Amhara, Oromia, and SNNPR).Accordingly, in 2019, BENEFIT-REALISE along with the MoA initiated a wereda-wide soil resource characterization and mapping task at1:50,000 scale in 15 BENEFIT-REALISE intervention weredas: 3 of Tigray, 6 of Amhara, 3 of Oromia, and 3 of SNNPR. Reference: Ashenafi, A., Tamene, L., and Erkossa, T. 2020. Identifying, Cataloguing, and Mapping Soil and Agronomic Data in Ethiopia. CIAT Publication No. 506. International Center for Tropical Agriculture (CIAT). Addis Ababa, Ethiopia. 42 p. 10.13140/RG.2.2.31759.41123. Ashenafi, A., Erkossa, T., Gudeta, K., Abera, W., Mesfin, E., Mekete, T., Haile, M., Haile, W., Abegaz, A., Tafesse, D. and Belay, G., 2022. Reference Soil Groups Map of Ethiopia Based on Legacy Data and Machine Learning Technique: EthioSoilGrids 1.0. EGUsphere, pp.1-40. https://doi.org/10.5194/egusphere-2022-301 Tamene L; Erkossa T; Tafesse T; Abera W; Schultz S. 2021. A coalition of the Willing - Powering data-driven solutions for Ethiopian Agriculture. CIAT Publication No. 518. International Center for Tropical Agriculture (CIAT). Addis Ababa, Ethiopia. 34 p. https://www.ethioagridata.com/Resources/Powering%20Data-Driven%20Solutions%20for%20Ethiopian%20Agriculture.pdf. The Coalition of the Willing (CoW) website: https://www.ethioagridata.com/index.html. Batjes, N.H., 2022. Basic principles for compiling a profile dataset for consideration in WoSIS. CoP report, ISRIC–World Soil Information, Wageningen. Contents Summary, 4(1), p.3. Carvalho Ribeiro, E.D. and Batjes, N.H., 2020. World Soil Information Service (WoSIS)-Towards the standardization and harmonization of world soil data: Procedures Manual 2020. Elias, E.: Soils of the Ethiopian Highlands: Geomorphology and Properties, CASCAPE Project, 648 ALTERRA, Wageningen UR, the Netherlands, library.wur.nl/WebQuery/isric/2259099, 649 2016. Leenaars, J. G. B., van Oostrum, A.J.M., and Ruiperez ,G.M.: Africa Soil Profiles Database, Version 1.2. A compilation of georeferenced and standardised legacy soil profile data for Sub Saharan Africa (with dataset), ISRIC Report 2014/01, Africa Soil Information Service (AfSIS) project and ISRIC – World Soil Information, Wageningen, library.wur.nl/WebQuery/isric/2259472, 2014. Leenaars, J. G. B., Eyasu, E., Wösten, H., Ruiperez González, M., Kempen, B.,Ashenafi, A., and Brouwer, F.: Major soil-landscape resources of the cascape intervention woredas, Ethiopia: Soil information in support to scaling up of evidence-based best practices in agricultural production (with dataset), CASCAPE working paper series No. OT_CP_2016_1, Cascape. https://edepot.wur.nl/428596, 2016. Leenaars, J. G. B., Elias, E., Wösten, J. H. M., Ruiperez-González, M., and Kempen, B.: Mapping the major soil-landscape resources of the Ethiopian Highlands using random forest, Geoderma, 361, https://doi.org/10.1016/j.geoderma.2019.114067, 2020a. 740 . Leenaars, J. G. B., Ruiperez, M., González, M., Kempen, B., and Mantel, S.: Semi-detailed soil resource survey and mapping of REALISE woredas in Ethiopia, Project report to the BENEFIT-REALISE programme, December, ISRIC-World Soil Information, Wageningen, 2020b.
TERMS: Access to the data is limited to the CoW members until the national soil and agronomy data-sharing directive of MoA is registered by the Ministry of Justice and released for implementation. DISCLAIMER: The dataset populated in the harmonized template consisting of 76 variables is extracted, transformed, and uploaded from the source document by the CoW. Hence, if any irregularities are observed, the data users have referred to the source document uploaded along with the dataset. Use of the dataset and any consequences arising from using it is the user’s sole responsibility.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Native American Multi-Year Facial Image Dataset, thoughtfully curated to support the development of advanced facial recognition systems, biometric identification models, KYC verification tools, and other computer vision applications. This dataset is ideal for training AI models to recognize individuals over time, track facial changes, and enhance age progression capabilities.
This dataset includes over 5,000+ high-quality facial images, organized into individual participant sets, each containing:
To ensure model generalization and practical usability, images in this dataset reflect real-world diversity:
Each participant’s dataset is accompanied by rich metadata to support advanced model training and analysis, including:
This dataset is highly valuable for a wide range of AI and computer vision applications:
To keep pace with evolving AI needs, this dataset is regularly updated and customizable. Custom data collection options include:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Any work using this dataset should cite the following paper:
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/YTRCQEhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/YTRCQE
Replication Data for: Who governs? A new global dataset on members of cabinets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cyberbullying has become an increasingly prevalent issue in the digital age, with the rise of social media and online communication. It can take many forms, including verbal attacks, harassment, and discrimination, and it can have serious consequences for victims, including depression, anxiety, and even suicide. While much research has been done on cyberbullying in languages such as English, Spanish, and Chinese, there has been little focus on languages spoken by smaller populations, such as Kurdish. Kurdish is a language spoken by millions of people in the Middle East, including Turkey, Iran, Iraq, and Syria. It is an Indo-European language with several dialects, and it is considered an official language in Iraq and an official regional language in Iran. Despite its widespread use, there has been very little research on cyberbullying in Kurdish, and there are currently no datasets available that specifically focus on this issue. To address this gap, we have created the first ever cyberbullying dataset for the Kurdish language. This dataset contains three classes: neutral, racism, and sexism. The neutral class includes messages that do not contain any form of cyberbullying, while the racism and sexism classes include messages that contain discriminatory language based on race or gender, respectively. The dataset was created using a combination of manual and automated techniques. We collected a large number of messages from Twitter API, that were written in Kurdish. We then manually labeled these messages based on whether they contained cyberbullying or not, and further categorized them into the three classes. The resulting dataset contains over 30,000 messages, with roughly equal distribution among the three classes. It is a valuable resource for researchers and practitioners who are interested in studying cyberbullying in the Kurdish language and developing strategies to combat it. The dataset can be used for a variety of purposes, including training machine learning models to detect cyberbullying in Kurdish, analyzing the language used in cyberbullying messages to identify patterns and trends, and developing interventions to prevent and address cyberbullying in Kurdish-speaking communities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains data on 2889 cyber incidents between 01.01.2000 and 02.05.2024 using 60 variables, including the start date, names and categories of receivers along with names and categories of initiators. The database was compiled as part of the European Repository of Cyber Incidents (EuRepoC) project.
EuRepoC gathers, codes, and analyses publicly available information from over 200 sources and 600 Twitter accounts daily to report on dynamic trends in the global, and particularly the European, cyber threat environment.For more information on the scope and data collection methodology see: https://eurepoc.eu/methodologyCodebook available hereInformation about each file:
Global Database (csv or xlsx):This file includes all variables coded for each incident, organised such that one row corresponds to one incident - our main unit of investigation. Where multiple codes are present for a single variable for a single incident, these are separated with semi-colons within the same cell.
Receiver Dataset (csv):In this file, the data of affected entities and individuals (receivers) is restructured to facilitate analysis. Each cell contains only a single code, with the data "unpacked" across multiple rows. Thus, a single incident can span several rows, identifiable through the unique identifier assigned to each incident (incident_id).
Attribution Dataset (csv):This file follows a similar approach to the receiver dataset. The attribution data is "unpacked" over several rows, allowing each cell to contain only one code. Here too, a single incident may occupy several rows, with the unique identifier enabling easy tracking of each incident (incident_id). In addition, some attributions may also have multiple possible codes for one variable, these are also "unpacked" over several rows, with the attribution_id enabling to track each attribution.eurepoc_global_database_1.2 (json):This file contains the whole database in JSON format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This model learning dataset is created out of the Raw Synthetic RWD raw dataset, including some of the original attributes. It is distributed in JOBLIB files, where .joblib files contain the vectors and _ids.joblib contain the ID of the person from which each vector is extracted.
This is useful in case it is needed to map the vectors to metadata about the people that are found in the original raw dataset. Note that corresponds to , or , depending on the dataset. The split is roughly 60% of the people are in the training dataset, and 20% in each of the validation and the testing datasets. The input attributes are the age, the short-term averages and the trends of the current week’s BMI, steps walked, calories burned, sleep quality, mood and water consumption, as well as the previous week’s short-term average and trend of the answer to the health self-assessment question.
The outcome to be predicted is the binary quantized health self-assessment answer to be given in the current week. The dataset is normalized based on the training set. The means and standard deviations used can be found in the train_statistics.joblib file. Finally, the output_descriptions.joblib file contains descriptions of the outcomes to be predicted (not actually needed, since included here).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Non-Hispanic population of White Earth by race. It includes the distribution of the Non-Hispanic population of White Earth across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of White Earth across relevant racial categories.
Key observations
With a zero Hispanic population, White Earth is 100% Non-Hispanic. Among the Non-Hispanic population, the largest racial group is White alone with a population of 76 (100% of the total Non-Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for White Earth Population by Race & Ethnicity. You can refer the same here
This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.
The World Religion Project (WRP) aims to provide detailed information about religious adherence worldwide since 1945. It contains data about the number of adherents by religion in each of the states in the international system. These numbers are given for every half-decade period (1945, 1950, etc., through 2010). Percentages of the states' populations that practice a given religion are also provided. (Note: These percentages are expressed as decimals, ranging from 0 to 1, where 0 indicates that 0 percent of the population practices a given religion and 1 indicates that 100 percent of the population practices that religion.) Some of the religions (as detailed below) are divided into religious families. To the extent data are available, the breakdown of adherents within a given religion into religious families is also provided.
The project was developed in three stages. The first stage consisted of the formation of a religion tree. A religion tree is a systematic classification of major religions and of religious families within those major religions. To develop the religion tree we prepared a comprehensive literature review, the aim of which was (i) to define a religion, (ii) to find tangible indicators of a given religion of religious families within a major religion, and (iii) to identify existing efforts at classifying world religions. (Please see the original survey instrument to view the structure of the religion tree.) The second stage consisted of the identification of major data sources of religious adherence and the collection of data from these sources according to the religion tree classification. This created a dataset that included multiple records for some states for a given point in time. It also contained multiple missing data for specific states, specific time periods and specific religions. The third stage consisted of cleaning the data, reconciling discrepancies of information from different sources and imputing data for the missing cases.
The Global Religion Dataset: This dataset uses a religion-by-five-year unit. It aggregates the number of adherents of a given religion and religious group globally by five-year periods.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 2 rows and is filtered where the books is The unknown Indians : people who quietly changed our world. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
This dataset contains crash information from the last five years to the current date. The data is based on the National Incident Based Reporting System (NIBRS). The data is dynamic, allowing for additions, deletions and modifications at any time, resulting in more accurate information in the database. Due to ongoing and continuous data entry, the numbers of records in subsequent extractions are subject to change.About Crash DataThe Cary Police Department strives to make crash data as accurate as possible, but there is no avoiding the introduction of errors into this process, which relies on data furnished by many people and that cannot always be verified. As the data is updated on this site there will be instances of adding new incidents and updating existing data with information gathered through the investigative process.Not surprisingly, crash data becomes more accurate over time, as new crashes are reported and more information comes to light during investigations.This dynamic nature of crash data means that content provided here today will probably differ from content provided a week from now. Likewise, content provided on this site will probably differ somewhat from crime statistics published elsewhere by the Town of Cary, even though they draw from the same database.About Crash LocationsCrash locations reflect the approximate locations of the crash. Certain crashes may not appear on maps if there is insufficient detail to establish a specific, mappable location.
THIS DATASET WAS LAST UPDATED AT 2:13 PM EASTERN ON OCT. 10
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This model learning dataset is created out of the Raw Synthetic RWD raw dataset, including some of the original attributes. It is distributed in JOBLIB files, where .joblib files contain the vectors and _ids.joblib contain the ID of the person from which each vector is extracted.
This is useful in case it is needed to map the vectors to metadata about the people that are found in the original raw dataset. Note that corresponds to , or , depending on the dataset.
The split is roughly 60% of the people are in the training dataset, and 20% in each of the validation and the testing datasets. The input attributes are the age, the short-term averages and the trends of the current week’s BMI, steps walked, calories burned, sleep quality, mood and water consumption, as well as the previous week’s short-term average and trend of the answer to the health self-assessment question.
The outcome to be predicted is a tristate quantized version of the health self-assessment answer to be given in the current week. The dataset is normalized based on the training set. The means and standard deviations used can be found in the train_statistics.joblib file. Finally, the output_descriptions.joblib file contains descriptions of the outcomes to be predicted (not actually needed, since included here).
Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.
The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.
National Coverage.
Individual
The target population is the civilian, non-institutionalized population 15 years and above. The sample is nationally representative.
Sample survey data [ssd]
The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.
Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.
Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.
The sample size in Japan was 1,000 individuals.
Landline telephone
The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.
Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.
Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank
This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.
Update Frequency: Biannual
For more information, see the World Bank website.
Fork this kernel to get started with this dataset.
https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics
https://cloud.google.com/bigquery/public-data/world-bank-hnp
Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Citation: The World Bank: Health Nutrition and Population Statistics
Banner Photo by @till_indeman from Unplash.
What’s the average age of first marriages for females around the world?