Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.
The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.
What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!
SELECT
age.country_name,
age.life_expectancy,
size.country_area
FROM (
SELECT
country_name,
life_expectancy
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
WHERE
year = 2016) age
INNER JOIN (
SELECT
country_name,
country_area
FROM
bigquery-public-data.census_bureau_international.country_names_area
where country_area > 25000) size
ON
age.country_name = size.country_name
ORDER BY
2 DESC
/* Limit removed for Data Studio Visualization */
LIMIT
10
Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.
SELECT
age.country_name,
SUM(age.population) AS under_25,
pop.midyear_population AS total,
ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25
FROM (
SELECT
country_name,
population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
WHERE
year =2017
AND age < 25) age
INNER JOIN (
SELECT
midyear_population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population
WHERE
year = 2017) pop
ON
age.country_code = pop.country_code
GROUP BY
1,
3
ORDER BY
4 DESC /* Remove limit for visualization*/
LIMIT
10
The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.
SELECT
growth.country_name,
growth.net_migration,
CAST(area.country_area AS INT64) AS country_area
FROM (
SELECT
country_name,
net_migration,
country_code
FROM
bigquery-public-data.census_bureau_international.birth_death_growth_rates
WHERE
year = 2017) growth
INNER JOIN (
SELECT
country_area,
country_code
FROM
bigquery-public-data.census_bureau_international.country_names_area
Historic (none)
United States Census Bureau
Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
In the pre-vaccination era in Sweden, smallpox was estimated to have been responsible for 2,000 deaths per million people every year; in other terms, this meant that approximately 0.2 percent of the entire population (or one in every 500 people) would die of smallpox annually. From looking at other data sets, we know that this figure was as high as 7,200 deaths per million in some years, as individual epidemics regularly devastated large portions of the population. When Jenner's findings on vaccination were adopted by the scientific community in Europe, Sweden was one of the first countries to begin vaccinating infants on a large scale. This is reflected in the considerable decline of smallpox deaths in the early nineteenth century, where the number of deaths fell to just 623 annual smallpox deaths per million in the first decade. By the middle of the century, when vaccination was made compulsory by the Swedish government, it dropped even further, to less than ten percent of the pre-vaccination death rate. In the last two decades in the nineteenth century, when Swedish authorities began penalizing parents for not vaccinating their children, the smallpox death rate fell to just two deaths per million people, and Sweden reported its final endemic (naturally occurring) case of smallpox in 1895; making it the second country in the world (after Iceland in 1872) to successfully eradicate the disease.
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
The International Panel for Climate Change (IPCC) produces regular Assessment Reports that provide global warming potentials (GWPs) for greenhouse gases (GHG) in the context of multiple time horizons including 20, 100, and 500 years. The GWPs (in kg CO2-equivalent per kg GHG) can be multiplied by kg GHGs emitted for use in estimating CO2-equivalent (CO2e) impacts of GHGs emitted. In the context of life cycle assessment (LCA) , the GWPs can be used as characterization factors in of the life cycle impact assessment. This dataset provides 20- (GWP-20), 100- (GWP-100) and 500-year (GWP-500) GWPs from the 4th (AR4), 6th (AR6) IPCC assessment reports, and 20- (GWP-20) and 100-year (GWP-100) GWPs from the 5th (AR5) report (AR5 provided no 500 yr GWPs). Datasets are provided in simple tables in Excel, in the openLCA JSON-LD format compliant with the U.S. Federal LCA Commons standards, and in Apache parquet format for the most efficient import into applications or scripts using languages like Python and R. The names for GHGs are from the Federal LCA Elementary Flow List (FEDEFL) v1.2, which are names preferred for this GHGs in the USEPA's Substance Registry Service. These datasets were created using the LCIA Formatter v1.1.1 (https://github.com/USEPA/LCIAformatter). The GWP values are provided in these formats for convenient use; the values have not been altered from the values reported in the Assessment Reports. Python code used to produce the data is available in a github gist under the supporting data links along with dataset metadata from the LCIA formatter.
In 1985 the population and health observatory was established at Mlomp, in the region of Ziguinchor, in southern Senegal (see map). The objective was to complement the two rural population observatories then existing in the country, Bandafassi, in the south-east, and Niakhar, in the centre-west, with a third observatory in a region - the south-west of the country (Casamance) - whose history, ethnic composition and economic situation were quite different from those of the regions where the first two observatories were located. It was expected that measuring the demographic levels and trends on those three sites would provide better coverage of the demographic and epidemiological diversity of the country.
Following a population census in 1984-1985, demographic events and causes of death have been monitored yearly. During the initial census, all women were interviewed concerning the birth and survival of their children. Since 1985, yearly censuses, usually conducted in January-February, have been recording demographic data, including all births, deaths, and migrations. The completeness and accuracy of dates of birth and death are cross-checked against those of registers of the local maternity ward (_95% of all births) and dispensary (all deaths are recorded, including those occurring outside the area), respectively. The study area comprises 11 villages with approximately 8000 inhabitants, mostly Diola. Mlomp is located in the Department of Oussouye, Region of Ziguinchor (Casamance), 500 km south of Dakar.
On 1 January 2000 the Mlomp area included a population of 7,591 residents living in 11 villages. The population density was 108 people per square kilometre. The population belongs to the Diola ethnic group, and the religion is predominantly animist, with a large minority of Christians and a few Muslims. Though low, the educational level - in 2000, 55% of women aged 15-49 had been to school (for at least one year) - is definitely higher than at Bandafassi. The population also benefits from much better health infrastructure and programmes. Since 1961, the area under study has been equipped with a private health centre run by French Catholic nurses and, since 1968, a village maternity centre where most women give birth. The vast majority of the children are totally immunized and involved in a growth-monitoring programme (Pison et al.,1993; Pison et al., 2001).
The Mlomp DSS site, about 500 km from the capital, Dakar, in Senegal, lies between latitudes 12°36' and 12°32'N and longitudes 16°33' and 16°37'E, at an altitude ranging from 0 to 20 m above sea level. It is in the region of Ziguinchor, Département of Oussouye (Casamance), in southwest Senegal. It is locates 50 km west of the city of Ziguinchor and 25 kms north of the border with Guinea Bissau. It covers about half the Arrondissement of Loudia-Ouolof. The Mlomp DSS site is about 11 km × 7 km and has an area of 70 km2. Villages are households grouped in a circle with a 3-km diameter and surrounded by lands that are flooded during the rainy season and cultivated for rice. There is still no electricity.
Individual
At the census, a person was considered a member of the compound if the head of the compound declared it to be so. This definition was broad and resulted in a de jure population under study. Thereafter, a criterion was used to decide whether and when a person was to be excluded or included in the population.
A person was considered to exit from the study population through either death or emigration. Part of the population of Mlomp engages in seasonal migration, with seasonal migrants sometimes remaining 1 or 2 years outside the area before returning. A person who is absent for two successive yearly rounds, without returning in between, is regarded as having emigrated and no longer resident in the study population at the date of the second round. This definition results in the inclusion of some vital events that occur outside the study area. Some births, for example, occur to women classified in the study population but physically absent at the time of delivery, and these births are registered and included in the calculation of rates, although information on them is less accurate. Special exit criteria apply to babies born outside the study area: they are considered emigrants on the same date as their mother.
A new person enters the study population either through birth to a woman of the study population or through immigration. Information on immigrants is collected when the list of compounds of a village is checked ("Are there new compounds or new families who settled since the last visit?") or when the list of members of a compound is checked ("Are there new persons in the compound since the last visit?"). Some immigrants are villagers who left the area several years before and were excluded from the study population. Information is collected to determine in which compound they were previously registered, to match the new and old information.
Information is routinely collected on movements from one compound to another within the study area. Some categories of the population, such as older widows or orphans, frequently move for short periods of time and live in between several compounds, and they may be considered members of these compounds or of none. As a consequence, their movements are not always declared.
Event history data
One round of data collection took place annually, except in 1987 and 2008.
No samplaing is done
None
Proxy Respondent [proxy]
List of questionnaires: - Household book (used to register informations needed to define outmigrations) - Delivery questionnaire (used to register information of dispensaire ol mlomp) - New household questionnaire - New member questionnaire - Marriage and divorce questionnaire - Birth and marital histories questionnaire (for a new member) - Death questionnaire (used to register the date of death)
On data entry data consistency and plausibility were checked by 455 data validation rules at database level. If data validaton failure was due to a data collection error, the questionnaire was referred back to the field for revisit and correction. If the error was due to data inconsistencies that could not be directly traced to a data collection error, the record was referred to the data quality team under the supervision of the senior database scientist. This could request further field level investigation by a team of trackers or could correct the inconsistency directly at database level.
No imputations were done on the resulting micro data set, except for:
a. If an out-migration (OMG) event is followed by a homestead entry event (ENT) and the gap between OMG event and ENT event is greater than 180 days, the ENT event was changed to an in-migration event (IMG). b. If an out-migration (OMG) event is followed by a homestead entry event (ENT) and the gap between OMG event and ENT event is less than 180 days, the OMG event was changed to an homestead exit event (EXT) and the ENT event date changed to the day following the original OMG event. c. If a homestead exit event (EXT) is followed by an in-migration event (IMG) and the gap between the EXT event and the IMG event is greater than 180 days, the EXT event was changed to an out-migration event (OMG). d. If a homestead exit event (EXT) is followed by an in-migration event (IMG) and the gap between the EXT event and the IMG event is less than 180 days, the IMG event was changed to an homestead entry event (ENT) with a date equal to the day following the EXT event. e. If the last recorded event for an individual is homestead exit (EXT) and this event is more than 180 days prior to the end of the surveillance period, then the EXT event is changed to an out-migration event (OMG)
In the case of the village that was added (enumerated) in 2006, some individuals may have outmigrated from the original surveillance area and setlled in the the new village prior to the first enumeration. Where the records of such individuals have been linked, and indivdiual can legitmately have and outmigration event (OMG) forllowed by and enumeration event (ENU). In a few cases a homestead exit event (EXT) was followed by an enumeration event in these cases. In these instances the EXT events were changed to an out-migration event (OMG).
On an average the response rate is about 99% over the years for each round.
Not applicable
CenterId Metric Table QMetric Illegal Legal Total Metric Rundate
SN012 MicroDataCleaned Starts 18756 2017-05-19 00:00
SN012 MicroDataCleaned Transitions 0 45136 45136 0 2017-05-19 00:00
SN012 MicroDataCleaned Ends 18756 2017-05-19 00:00
SN012 MicroDataCleaned SexValues 38 45098 45136 0 2017-05-19 00:00
SN012 MicroDataCleaned DoBValues 204 44932 45136 0 2017-05-19 00:00
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This collection contains a the datasets created as part of a masters thesis. The collection consists of two datasets in two forms as well as the corresponding entity descriptions for each of the datasets.The experiment_doc_labels_clean documents contain the data used for the experiments. The JSON file consists of a list of JSON objects. The JSON objects contain the following fields: id: Document idner_tags: List of IOB tags indicating mention boundaries based on the majority label assigned using crowdsourcing.el_tags: List of entity ids based on the majority label assigned using crowdsourcing.all_ner_tags: List of lists of IOB tags assigned by each of the users.all_el_tags: List of lists of entity IDs assigned by each of the users annotating the data.tokens: List of tokens from the text.The experiment_doc_labels_clean-U.tsv contains the dataset used for the experiments but in in a format similar to the CoNLL-U format. The first line for each document contains the document ID. The documents are separated by a blank line. Each word in a document is on its own line consisting of the word the IOB tag and the entity id separated by tags.While the experiments were being completed the annotation system was left open until all the documents had been annotated by three users. This resulted in the all_docs_complete_labels_clean.json and all_docs_complete_labels_clean-U.tsv datasets. The all_docs_complete_labels_clean.json and all_docs_complete_labels_clean-U.tsv documents take the same form as the experiment_doc_labels_clean.json and experiment_doc_labels_clean-U.tsv.Each of the documents described above contain an entity id. The IDs match to the entities stored in the entity_descriptions CSV files. Each of row in these files corresponds to a mention for an entity and take the form:{ID}${Mention}${Context}[N]Three sets of entity descriptions are available:1. entity_descriptions_experiments.csv: This file contains all the mentions from the subset of the data used for the experiments as described above. However, the data has not been cleaned so there are multiple entity IDs which actually refer to the same entity.2. entity_descriptions_experiments_clean.csv: These entities also cover the data used for the experiments, however, duplicate entities have been merged. These entities correspond to the labels for the documents in the experiment_doc_labels_clean files.3. entity_descriptions_all.csv: The entities in this file correspond to the data in the all_docs_complete_labels_clean. Please note that the entities have not been cleaned so there may be duplicate or incorrect entities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the Netherlands, every year 500,000 people are confronted with the death of a close relative. Many of these people experience little emotional distress. In some, bereavement precipitates severe grief, distress, and dysphoria. A small yet significant minority of bereaved individuals develops persistent and debilitating symptoms of persistent complex bereavement disorder (PCBD) (also termed prolonged grief disorder), posttraumatic stress disorder, and depression. Knowledge about early identification of, and preventive care for complex grief has increased. Moreover, in recent years there has been an increase in treatment options for people for whom loss leads to persistent psychological problems. That said, preventive and curative treatments are effective for some, but not all bereaved individuals experiencing distress and dysfunction following loss. This necessitates further research on the development, course, and treatment of various stages of complex grief, including PCBD. “Complex grief” is an informal term referring to debilitating, non-normative grief. It will likely be named Prolonged Grief Disorder in the forthcoming ICD-11. It is named Persistent Complex Bereavement Disorder in DSM-5. Research on the development, course, and treatment of complex grief is needed. This research should address different stages and manifestations of complex grief.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tree mortality maps derived from aerial photos (NAIP) of 2020 in California for the paper titled Scattered tree death contributes to substantial forest loss in California. The dataset includes dead tree count per ha (100 m), median dead crown size per ha (100 m), percentage of dead canopy area per ha (100 m), percentage of brown-stage mortality per ha (100 m), eccentricity map (500 m), and percentage of tree mortality (240 m). The spatial coverage is the vegetated area (woody vegetation) in California. The projection system is EPSG:5072.Publication: Cheng, Y. et al. Scattered tree death contributes to substantial forest loss in California. Nat. Commun. (2024). https://doi.org/10.1038/s41467-024-44991-z
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets are for a cohort of n=1540 anonymised hospitalised COVID-19 patients, and the data provide information on outcomes (i.e. patient death or discharge), demographics and biomarker measurements for two New York hospitals: State University of New York (SUNY) Downstate Health Sciences University and Maimonides Medical Center.
The file "demographics_both_hospitals.csv" contains the ultimate outcomes of hospitalisation (whether a patient was discharged or died), demographic information and known comorbidities for each of the patients.
The file "dynamics_clean_both_hospitals.csv" contains cleaned dynamic biomarker measurements for the n=1233 patients where this information was available and the data passed our various checks (see https://doi.org/10.1101/2021.11.12.21266248 for information of these checks and the cleaning process). Patients can be matched to demographic data via the "id" column.
Study approval and data collection
Study approval was obtained from the State University of New York (SUNY) Downstate Health Sciences University Institutional Review Board (IRB#1595271-1) and Maimonides Medical Center Institutional Review Board/Research Committee (IRB#2020-05-07). A retrospective query was performed among the patients who were admitted to SUNY Downstate Medical Center and Maimonides Medical Center with COVID-19-related symptoms, which was subsequently confirmed by RT PCR, from the beginning of February 2020 until the end of May 2020. Stratified randomization was used to select at least 500 patients who were discharged and 500 patients who died due to the complications of COVID-19. Patient outcome was recorded as a binary choice of “discharged” versus “COVID-19 related mortality”. Patients whose outcome was unknown were excluded. Demographic, clinical history and laboratory data was extracted from the hospital’s electronic health records.
This United States Environmental Protection Agency (US EPA) feature layer represents monitoring site data, updated hourly concentrations and Air Quality Index (AQI) values for the latest hour received from monitoring sites that report to AirNow.Map and forecast data are collected using federal reference or equivalent monitoring techniques or techniques approved by the state, local or tribal monitoring agencies. To maintain "real-time" maps, the data are displayed after the end of each hour. Although preliminary data quality assessments are performed, the data in AirNow are not fully verified and validated through the quality assurance procedures monitoring organizations used to officially submit and certify data on the EPA Air Quality System (AQS).This data sharing, and centralization creates a one-stop source for real-time and forecast air quality data. The benefits include quality control, national reporting consistency, access to automated mapping methods, and data distribution to the public and other data systems. The U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, National Park Service, tribal, state, and local agencies developed the AirNow system to provide the public with easy access to national air quality information. State and local agencies report the Air Quality Index (AQI) for cities across the US and parts of Canada and Mexico. AirNow data are used only to report the AQI, not to formulate or support regulation, guidance or any other EPA decision or position.About the AQIThe Air Quality Index (AQI) is an index for reporting daily air quality. It tells you how clean or polluted your air is, and what associated health effects might be a concern for you. The AQI focuses on health effects you may experience within a few hours or days after breathing polluted air. EPA calculates the AQI for five major air pollutants regulated by the Clean Air Act: ground-level ozone, particle pollution (also known as particulate matter), carbon monoxide, sulfur dioxide, and nitrogen dioxide. For each of these pollutants, EPA has established national air quality standards to protect public health. Ground-level ozone and airborne particles (often referred to as "particulate matter") are the two pollutants that pose the greatest threat to human health in this country.A number of factors influence ozone formation, including emissions from cars, trucks, buses, power plants, and industries, along with weather conditions. Weather is especially favorable for ozone formation when it’s hot, dry and sunny, and winds are calm and light. Federal and state regulations, including regulations for power plants, vehicles and fuels, are helping reduce ozone pollution nationwide.Fine particle pollution (or "particulate matter") can be emitted directly from cars, trucks, buses, power plants and industries, along with wildfires and woodstoves. But it also forms from chemical reactions of other pollutants in the air. Particle pollution can be high at different times of year, depending on where you live. In some areas, for example, colder winters can lead to increased particle pollution emissions from woodstove use, and stagnant weather conditions with calm and light winds can trap PM2.5 pollution near emission sources. Federal and state rules are helping reduce fine particle pollution, including clean diesel rules for vehicles and fuels, and rules to reduce pollution from power plants, industries, locomotives, and marine vessels, among others.How Does the AQI Work?Think of the AQI as a yardstick that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 represents good air quality with little potential to affect public health, while an AQI value over 300 represents hazardous air quality.An AQI value of 100 generally corresponds to the national air quality standard for the pollutant, which is the level EPA has set to protect public health. AQI values below 100 are generally thought of as satisfactory. When AQI values are above 100, air quality is considered to be unhealthy-at first for certain sensitive groups of people, then for everyone as AQI values get higher.Understanding the AQIThe purpose of the AQI is to help you understand what local air quality means to your health. To make it easier to understand, the AQI is divided into six categories:Air Quality Index(AQI) ValuesLevels of Health ConcernColorsWhen the AQI is in this range:..air quality conditions are:...as symbolized by this color:0 to 50GoodGreen51 to 100ModerateYellow101 to 150Unhealthy for Sensitive GroupsOrange151 to 200UnhealthyRed201 to 300Very UnhealthyPurple301 to 500HazardousMaroonNote: Values above 500 are considered Beyond the AQI. Follow recommendations for the Hazardous category. Additional information on reducing exposure to extremely high levels of particle pollution is available here.Each category corresponds to a different level of health concern. The six levels of health concern and what they mean are:"Good" AQI is 0 to 50. Air quality is considered satisfactory, and air pollution poses little or no risk."Moderate" AQI is 51 to 100. Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people. For example, people who are unusually sensitive to ozone may experience respiratory symptoms."Unhealthy for Sensitive Groups" AQI is 101 to 150. Although general public is not likely to be affected at this AQI range, people with lung disease, older adults and children are at a greater risk from exposure to ozone, whereas persons with heart and lung disease, older adults and children are at greater risk from the presence of particles in the air."Unhealthy" AQI is 151 to 200. Everyone may begin to experience some adverse health effects, and members of the sensitive groups may experience more serious effects."Very Unhealthy" AQI is 201 to 300. This would trigger a health alert signifying that everyone may experience more serious health effects."Hazardous" AQI greater than 300. This would trigger a health warnings of emergency conditions. The entire population is more likely to be affected.AQI colorsEPA has assigned a specific color to each AQI category to make it easier for people to understand quickly whether air pollution is reaching unhealthy levels in their communities. For example, the color orange means that conditions are "unhealthy for sensitive groups," while red means that conditions may be "unhealthy for everyone," and so on.Air Quality Index Levels of Health ConcernNumericalValueMeaningGood0 to 50Air quality is considered satisfactory, and air pollution poses little or no risk.Moderate51 to 100Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.Unhealthy for Sensitive Groups101 to 150Members of sensitive groups may experience health effects. The general public is not likely to be affected.Unhealthy151 to 200Everyone may begin to experience health effects; members of sensitive groups may experience more serious health effects.Very Unhealthy201 to 300Health alert: everyone may experience more serious health effects.Hazardous301 to 500Health warnings of emergency conditions. The entire population is more likely to be affected.Note: Values above 500 are considered Beyond the AQI. Follow recommendations for the "Hazardous category." Additional information on reducing exposure to extremely high levels of particle pollution is available here.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large financial payoff. The amount of financial data on the web is seemingly endless. A large and well structured dataset on a wide array of companies can be hard to come by. Here I provide a dataset with historical stock prices (last 5 years) for all companies currently found on the S&P 500 index.
The script I used to acquire all of these .csv files can be found in this GitHub repository In the future if you wish for a more up to date dataset, this can be used to acquire new versions of the .csv files.
The data is presented in a couple of formats to suit different individual's needs or computational limitations. I have included files containing 5 years of stock data (in the all_stocks_5yr.csv and corresponding folder) and a smaller version of the dataset (all_stocks_1yr.csv) with only the past year's stock data for those wishing to use something more manageable in size.
The folder individual_stocks_5yr contains files of data for individual stocks, labelled by their stock ticker name. The all_stocks_5yr.csv and all_stocks_1yr.csv contain this same data, presented in merged .csv files. Depending on the intended use (graphing, modelling etc.) the user may prefer one of these given formats.
All the files have the following columns: Date - in format: yy-mm-dd Open - price of the stock at market open (this is NYSE data so all in USD) High - Highest price reached in the day Low Close - Lowest price reached in the day Volume - Number of shares traded Name - the stock's ticker name
I scraped this data from Google finance using the python library 'pandas_datareader'. Special thanks to Kaggle, Github and The Market.
This dataset lends itself to a some very interesting visualizations. One can look at simple things like how prices change over time, graph an compare multiple stocks at once, or generate and graph new metrics from the data provided. From these data informative stock stats such as volatility and moving averages can be easily calculated. The million dollar question is: can you develop a model that can beat the market and allow you to make statistically informed trades!
Number of infant deaths and infant mortality rates, by age group (neonatal and post-neonatal), 1991 to most recent year.
The Health and Demographic Surveillance System (HDSS) in Niakhar, a rural area of Senegal, is located 135 km east of Dakar. This HDSS has been set up in 1962 by the Institut de Recherche pour le Développement (IRD) to face the shortcomings of the civil registration system and provide demographic indicators.
Some 65 villages were followed annually in the Niakhar area from 1962 to 1969. The study zone was reduced to eight villages from 1969 to 1983, and from then on the HDSS was extended to include 22 other villages, covering a total of 30 villages for a population estimated at 45,000 in December 2013. Thus 8 villages have been under demographic surveillance for almost 50 years and 30 villages for 30years.
Vital events, migrations, marital changes, pregnancies, immunization are routinely recorded (every four months). The database also includes epidemiological, economic and environmental information coming from specific surveys. Data were collected through annual rounds from 1962 to 1987; rounds became weekly from 1987 to 1997; routine visits were conducted every three months between 1997and 2007 and every four months since then.
The current objectives are 1) to obtain a long-term assessment of demographic and socio-economic indicators necessary for bio-medical and social sciences research, 2) to keep up epidemiological and environmental monitoring, 3) to provide a research platform for clinical and interdisciplinary research (medical, social and environmental sciences). Research projects during the last 5 years are listed in Table 2. The Niakhar HDSS has institutional affiliation with the Institut de Recherche pour le Développement (IRD, formerly ORSTOM).
The study zone of Niakhar is located in Senegal, 14.5ºN Latitude and 16.5ºW Longitude in the department of Fatick (Sine-Saloum), 135 km east of Dakar. The Niakhar study zone covers 203 square kilometres and is located in the continental Sahelian-Sudanese climatic zone. For thirty years the region has suffered from drought. The average annual rainfall has decreased from 800 mm in the 1950s to 500 mm in the 1980s. Increasing amounts of precipitation have been observed since the mid-2000s with an average annual rainfall of 600 mm between 2005 and 2010. The area is 203 square kilometers.
Individual
Members of households reside within the demographic surveillance area. Inmigrants are defined by intention to become resident, but actual residence episodes of less than 180 days are censored. Outmigrants are defined by intention to become resident elsewhere, but actual periods of non-residence less than 180 days are censored, except seasonal work migrants, worker with a wife resident, pupils or students. Children born to resident women are considered resident by default, irrespective of actual place of birth. The dataset contains the events of all individuals ever resident during the study period (1 Jan 1990 to 31 Dec 2013).
The Niakhar HDSS collects for each resident the following basic data: individual, household and compound identifying information, mother and father identification, relationship to the head of household and spousal relationship. From 1983 to 2007, the HDSS routinely monitored deaths, pregnancies, births, miscarriages, stillbirths, weaning, migrations, changes of marital status, immunizations, and cases of measles and whooping cough. For the last 5 years, the HDSS only recorded demographic events related to each resident including cause of death. Verbal autopsies have been conducted after all deaths except for those that occurred between 1999 and 2004 where only deaths for people aged 0-55 years were investigated. The Niakhar HDSS also registers visitors as well as all the demographic events related to them in case of in-migration. Household characteristics (living conditions, domestic equipment, etc.) were collected in 1998 and 2003, and community equipment (schools, boreholes, etc.) in 2003. Economic and environmental data will be collected in 2013. Table 3 presents further details on the data items collected. The Niakhar HDSS interviewers collect data with tablet PCs that are loaded with the last updated database linked to a user-friendly interface indicating the household members and the questionnaire. Daily backups are performed on an external hard drive and weekly synchronizations are scheduled during the round, helping to update the database and check data consistency (i.e. residential moves within the study area or marriages). Applications are Developed in Visual Basic.Net and the database is managed with Microsoft Access.
Event history data
This dataset contains rounds 1 to 18 of demographic surveillance data covering the period from 1 Jan 1983 to 31 December 2015.
From 1983 to 1987, data were collected through annual rounds during the dry season. Demographic events were collected by interviewers using a printed list of compound residents with their characteristics. From 1987 to 1997, rounds became weekly because of the need for continuous birth registration for vaccine trials. Annual censuses were carried out to check data collection, particularly relative to in- and out-migration. Routine visits were conducted in the 30 villages of the study area every three months between 1997and 2007 and every four months between 2008 and 2012 and every six month since then.
This dataset is not based on a sample; it contains information from the complete demographic surveillence area.
None
Proxy Respondent [proxy]
List of questionnaires:
Compound Registration or update Form Houshold Registration or update Form Household Membership Registration or update Form External Migration Registration Form Internal Migration Registration Form Individual Registration Form Birth Registration Form Death Registration Form
On data entry data consistency and plausibility were checked by 455 data validation rules at database level. If data validaton failure was due to a data collection error, the questionnaire was referred back to the field for revisit and correction. If the error was due to data inconsistencies that could not be directly traced to a data collection error, the record was referred to the data quality team under the supervision of the senior database scientist. This could request further field level investigation by a team of trackers or could correct the inconsistency directly at database level.
No imputations were done on the resulting micro data set, except for:
a. If an out-migration (OMG) event is followed by a homestead entry event (ENT) and the gap between OMG event and ENT event is greater than 180 days, the ENT event was changed to an in-migration event (IMG). b. If an out-migration (OMG) event is followed by a homestead entry event (ENT) and the gap between OMG event and ENT event is less than 180 days, the OMG event was changed to an homestead exit event (EXT) and the ENT event date changed to the day following the original OMG event. c. If a homestead exit event (EXT) is followed by an in-migration event (IMG) and the gap between the EXT event and the IMG event is greater than 180 days, the EXT event was changed to an out-migration event (OMG). d. If a homestead exit event (EXT) is followed by an in-migration event (IMG) and the gap between the EXT event and the IMG event is less than 180 days, the IMG event was changed to an homestead entry event (ENT) with a date equal to the day following the EXT event. e. If the last recorded event for an individual is homestead exit (EXT) and this event is more than 180 days prior to the end of the surveillance period, then the EXT event is changed to an out-migration event (OMG)
In the case of the village that was added (enumerated) in 2006, some individuals may have outmigrated from the original surveillance area and setlled in the the new village prior to the first enumeration. Where the records of such individuals have been linked, and indivdiual can legitmately have and outmigration event (OMG) forllowed by and enumeration event (ENU). In a few cases a homestead exit event (EXT) was followed by an enumeration event in these cases. In these instances the EXT events were changed to an out-migration event (OMG).
On an average the response rate is about 99% over the years for each round
Not Applicable
CentreId MetricTable QMetric Illegal Legal Total Metric RunDate
SN013 MicroDataCleaned Starts 86883 2017-05-19 15:12
SN013 MicroDataCleaned Transitions 241970 241970 0 2017-05-19 15:12
SN013 MicroDataCleaned Ends 86883 2017-05-19 15:12
SN013 MicroDataCleaned SexValues 32 241938 241970 0 2017-05-19 15:12
SN013 MicroDataCleaned DoBValues 241970 2017-05-19 15:12
NOTE: This dataset has been retired and marked as historical-only.
Only Chicago residents are included based on the home ZIP Code, as provided by the medical provider, or the address, as provided by the Cook County Medical Examiner.
Cases with a positive molecular (PCR) or antigen test are included in this dataset. Cases are counted on the date the test specimen was collected. Deaths are those occurring among cases based on the day of death. Hospitalizations are based on the date of first hospitalization. Only one hospitalization is counted for each case. Demographic data are based on what is reported by medical providers or collected by CDPH during follow-up investigation.
Because of the nature of data reporting to CDPH, hospitalizations will be blank for recent dates They will fill in on later updates when the data are received, although, as for cases and deaths, may continue to be updated as further data are received.
All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. At any given time, this dataset reflects data currently known to CDPH.
Numbers in this dataset may differ from other public sources due to definitions of COVID-19-related cases, deaths, and hospitalizations, sources used, how cases, deaths and hospitalizations are associated to a specific date, and similar factors.
Data Source: Illinois National Electronic Disease Surveillance System, Cook County Medical Examiner’s Office
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
EcoDRR global classification scheme based on spatial combination of ecosystem coverage and natural hazard physical exposure. The physical exposure data-set shows the product of hazard frequency and people exposed to this hazard in the same 100 square kilometer cell. For a specific natural hazard, a 0.01 degree resolution raster is generated, showing hazard annual frequency weighted with portion of pixel potentially affected. The population raster has the same resolution and represents the absolute number of inhabitants in a 0.01 degree cell. The physical exposure in a 100 km2 grid cell is the sum of the included physical exposure raster cells.
Sources: Tsunami frequency. It is based on two sources: 1) A comprehensive list of reports and scientific papers compiled and utilized in producing tsunami hazard maps as well as finding return periods of future events. 2) Applying numerical tsunami models and zooming on selected areas. Unit is expected affected percentage of each pixel over a minimum return period of 500 years. This product was designed by International Centre for Geohazards /NGI for the Global Assessment Report on Risk Reduction (GAR). It was modeled using global data. Credit: GIS processing International Centre for Geohazards /NGI.
GHS Population GRID. The spatial raster dataset depicts the distribution of population, expressed as the number of people per cell. Residential population estimates for target years 1975, 1990, 2000 and 2015 provided by CIESIN GPWv4.10 were disaggregated from census or administrative units to grid cells, informed by the distribution and density of built-up as mapped in the Global Human Settlement Layer (GHSL) global layer per corresponding epoch. Credit: European Commission, Joint Research Centre; Columbia University, Center for International Earth Science Information Network (2015): GHS-POP R2015A - GHS population grid, derived from GPW4, multitemporal (1975, 1990, 2000, 2015). European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/jrc-ghsl-ghs_pop_gpw4_globe_r2015a
A range of indicators for a selection of cities from the New York City Global City database.
Dataset includes the following:
Geography
City Area (km2)
Metro Area (km2)
People
City Population (millions)
Metro Population (millions)
Foreign Born
Annual Population Growth
Economy
GDP Per Capita (thousands $, PPP rates, per resident)
Primary Industry
Secondary Industry
Share of Global 500 Companies (%)
Unemployment Rate
Poverty Rate
Transportation
Public Transportation
Mass Transit Commuters
Major Airports
Major Ports
Education
Students Enrolled in Higher Education
Percent of Population with Higher Education (%)
Higher Education Institutions
Tourism
Total Tourists Annually (millions)
Foreign Tourists Annually (millions)
Domestic Tourists Annually (millions)
Annual Tourism Revenue ($US billions)
Hotel Rooms (thousands)
Health
Infant Mortality (Deaths per 1,000 Births)
Life Expectancy in Years (Male)
Life Expectancy in Years (Female)
Physicians per 100,000 People
Number of Hospitals
Anti-Smoking Legislation
Culture
Number of Museums
Number of Cultural and Arts Organizations
Environment
Green Spaces (km2)
Air Quality
Laws or Regulations to Improve Energy Efficiency
Retrofitted City Vehicle Fleet
Bike Share Program
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
At the time this Dataset was created in Kaggle (2016-09-09), the original version was hosted by Open Data by Socrata at the at: https://opendata.socrata.com/Government/Airplane-Crashes-and-Fatalities-Since-1908/q2te-8cvq, but unfortunately that is not available anymore. The dataset contains data of airplane accidents involving civil, commercial and military transport worldwide from 1908-09-17 to 2009-06-08.
While applying for a data scientist job opportunity, I was asked the following questions on this dataset:
My solution was:
The following bar charts display the answers requested by point 1. of the assignment, in particular:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F298505%2F37efb7629abf402544ddc46cc3a2d7bb%2F_results_0_0.png?generation=1587821759491827&alt=media" alt="">
The following answers regard point 2 of the assignment
I have identified 7 clusters using k-means clustering technique on a matrix obtained by a text corpus created by using Text Analysis (plain text, remove punctuation, to lower, etc.) The following table summarize for each cluster the number of crashes and death.
The following picture shows clusters using the first 2 principal components:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F298505%2Fea73e0fe9ca12d594fd83f285d3eff62%2F_results_1_17.png?generation=1587821871806437&alt=media" alt="">
For each clusters I will summarize the most used words and I will try to identify the causes of the crash
Cluster 1 (258) aircraft, crashed, plane, shortly, taking. No many information about this cluster can be deducted using Text Analysis
Cluster 2 (500) aircraft, airport, altitude, crashed, crew, due, engine, failed, failure, fire, flight, landing, lost, pilot, plane, runway, takeoff, taking. Engine failure on the runway after landing or takeoff
Cluster 3 (211): aircraft, crashed, fog Crash caused by fog
Cluster 4 (1014): aircraft, airport, attempting, cargo, crashed, fire, land, landing, miles, pilot, plane, route, runway, struck, takeoff Struck a cargo during landing or takeoff
Cluster 5 (2749):
accident, aircraft, airport, altitude, approach, attempting, cargo, conditions, control, crashed, crew, due, engine, failed, failure, feet, fire, flight, flying, fog, ground, killed, land, landing, lost, low, miles, mountain, pilot. plane, poor, route, runway, short, shortly, struck, takeoff, taking, weather
Struck a cargo due to engine failure or bad weather conditions mainly fog
Cluster 6 (195):
aircraft, crashed, engine, failure, fire, flight, left, pilot, plane, runway
Engine failure on the runway
Cluster 7 (341):
accident, aircraft, altitude, cargo, control, crashed, crew, due, engine, failure, flight, landing, loss, lost, pilot, plane, takeoff
Engine failure during landing or takeoff
Better solutions are welcome.
Thanks, Sauro
In 2020, the research project ’10 Years Up’ (10yup.nl) started at Utrecht University. It is the effort of a multi-disciplinary team. The study follows a longitudinal cohort of 500 young adults from age 16 for 10 years. The participants were recruited by random selection from the general population of 16-year old adolescents. Every three months, they use a mobile app to indicate which goals they are pursuing by swiping right or left, and answer questions about how they go about realizing them. They also fill in questionnaires about various other topics related to their current circumstances, health and well-being. This way, we will be able to determine antecedents and consequences of goal setting and striving as a self-regulation strategy over time. Finding out about young people’s choices affect their health and well-being helps us to be able to guide them in the important transition from adolescent to adult. By knowing which goals are important to them in this period, we can support them in their development.Data of the first year of collection can be found in this repository, split in four separate files per wave:Wave 1: Oktober/November 2020Wave 2: Januari 2021Wave 3: April 2021Wave 4: Augustus 2021For questions on this dataset, please contact the project manager at 10yup@uu.nl
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.