The United States have recently become the country with the most reported cases of 2019 Novel Coronavirus (COVID-19). This dataset contains daily updated number of reported cases & deaths in the US on the state and county level, as provided by the Johns Hopkins University. In addition, I provide matching demographic information for US counties.
The dataset consists of two main csv files: covid_us_county.csv
and us_county.csv
. See the column descriptions below for more detailed information. In addition, I've added US county shape files for geospatial plots: us_county.shp/dbf/prj/shx.
covid_us_county.csv
: COVID-19 cases and deaths which will be updated daily. The data is provided by the Johns Hopkins University through their excellent github repo. I combined the separate "confirmed cases" and "deaths" files into a single table, removed a few (I think to be) redundant geo identifier columns, and reshaped the data into long format with a single date
column. The earliest recorded cases are from 2020-01-22.
us_counties.csv
: Demographic information on the US county level based on the (most recent) 2014-18 release of the Amercian Community Survey. Derived via the great tidycensus package.
COVID-19 dataset covid_us_county.csv
:
fips
: County code in numeric format (i.e. no leading zeros). A small number of cases have NA values here, but can still be used for state-wise aggregation. Currently, this only affect the states of Massachusetts and Missouri.
county
: Name of the US county. This is NA for the (aggregated counts of the) territories of American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and Virgin Islands.
state
: Name of US state or territory.
state_code
: Two letter abbreviation of US state (e.g. "CA" for "California"). This feature has NA values for the territories listed above.
lat
and long
: coordinates of the county or territory.
date
: Reporting date.
cases
& deaths
: Cumulative numbers for cases & deaths.
Demographic dataset us_counties.csv
:
fips
, county
, state
, state_code
: same as above. The county names are slightly different, but mostly the difference is that this dataset has the word "County" added. I recommend to join on fips
.
male
& female
: Population numbers for male and female.
population
: Total population for the county. Provided as convenience feature; is always the sum of male + female
.
female_percentage
: Another convenience feature: female / population
in percent.
median_age
: Overall median age for the county.
Data provided for educational and academic research purposes by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).
The github repo states that:
This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This is a monthly report on publicly funded community services for children, young people and adults using data from the Community Services Data Set (CSDS) reported in England for December 2017. The CSDS is a patient-level dataset providing information relating to publicly funded community services for children, young people and adults. These services can include health centres, schools, mental health trusts, and health visiting services. The data collected includes personal and demographic information, diagnoses including long-term conditions and disabilities and care events plus screening activities. It has been developed to help achieve better outcomes for children, young people and adults. It provides data that will be used to commission services in a way that improves health, reduces inequalities, and supports service improvement and clinical quality. Prior to October 2017, the predecessor Children and Young People's Health Services (CYPHS) Data Set collected data for children and young people aged 0-18. The CSDS superseded the CYPHS data set to allow adult community data to be submitted, expanding the scope of the existing data set by removing the 0-18 age restriction. The structure and content of the CSDS remains the same as the previous CYPHS data set. Further information about the CYPHS and related statistical reports is available from https://digital.nhs.uk/data-and-information/data-collections-and-data-sets/data-sets/children-and-young-people-s-health-services-data-set References to children and young people covers records submitted for 0-18 year olds and references to adults covers records submitted for those aged over 18. Where analysis for both groups have been combined, this is referred to as all patients. These statistics are classified as experimental and should be used with caution. Experimental statistics are new official statistics undergoing evaluation. They are published in order to involve users and stakeholders in their development and as a means to build in quality at an early stage. More information about experimental statistics can be found on the UK Statistics Authority website. This month's statistical release also includes a separate quarterly analysis focusing on 6-8 week breastfeeding status and 24, 27 and 30 month Ages and Stages (ASQ-3) scoring, October - December 2017. This file has been revised and is available as part of the March 2018 publication. We hope this information is helpful and would be grateful if you could spare a couple of minutes to complete a short customer satisfaction survey. Please use this form to provide us with any feedback or suggestions for improving the report.
Abstract copyright UK Data Service and data collection copyright owner.The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at the local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS), all its associated LFS boosts and the APS boost. The APS aims to provide enhanced annual data for England, covering a target sample of at least 510 economically active persons for each Unitary Authority (UA)/Local Authority District (LAD) and at least 450 in each Greater London Borough. In combination with local LFS boost samples, the survey provides estimates for a range of indicators down to Local Education Authority (LEA) level across the United Kingdom.For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation. For variable and value labelling and coding frames that are not included either in the data or in the current APS documentation, users are advised to consult the latest versions of the LFS User Guides, which are available from the ONS Labour Force Survey - User Guidance webpages.Occupation data for 2021 and 2022The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. The affected datasets have now been updated. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022APS Well-Being DatasetsFrom 2012-2015, the ONS published separate APS datasets aimed at providing initial estimates of subjective well-being, based on the Integrated Household Survey. In 2015 these were discontinued. A separate set of well-being variables and a corresponding weighting variable have been added to the April-March APS person datasets from A11M12 onwards. Further information on the transition can be found in the Personal well-being in the UK: 2015 to 2016 article on the ONS website.APS disability variablesOver time, there have been some updates to disability variables in the APS. An article explaining the quality assurance investigations on these variables that have been conducted so far is available on the ONS Methodology webpage. End User Licence and Secure Access APS dataUsers should note that there are two versions of each APS dataset. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. The EUL version includes Government Office Region geography, banded age, 3-digit SOC and industry sector for main, second and last job. The Secure Access version contains more detailed variables relating to: age: single year of age, year and month of birth, age completed full-time education and age obtained highest qualification, age of oldest dependent child and age of youngest dependent child family unit and household: including a number of variables concerning the number of dependent children in the family according to their ages, relationship to head of household and relationship to head of family nationality and country of origin geography: including county, unitary/local authority, place of work, Nomenclature of Territorial Units for Statistics 2 (NUTS2) and NUTS3 regions, and whether lives and works in same local authority district health: including main health problem, and current and past health problems education and apprenticeship: including numbers and subjects of various qualifications and variables concerning apprenticeships industry: including industry, industry class and industry group for main, second and last job, and industry made redundant from occupation: including 4-digit Standard Occupational Classification (SOC) for main, second and last job and job made redundant from system variables: including week number when interview took place and number of households at address The Secure Access data have more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL access version of the APS April 2013 - March 2014 Subjective Well-Being dataset is held under SN 7566. Main Topics:Topics covered include: household composition and relationships, housing tenure, nationality, ethnicity and residential history, employment and training (including government schemes), workplace and location, job hunting, educational background and qualifications. Many of the variables included in the survey are the same as those in the LFS. Multi-stage stratified random sample Face-to-face interview Telephone interview 2013 2014 ACADEMIC ACHIEVEMENT ADULT EDUCATION AGE ANXIETY APPLICATION FOR EMP... APPOINTMENT TO JOB ATTITUDES BONUS PAYMENTS CHRONIC ILLNESS COHABITATION CONDITIONS OF EMPLO... DEBILITATIVE ILLNESS DEGREES DISABILITIES Demography population ECONOMIC ACTIVITY EDUCATIONAL BACKGROUND EDUCATIONAL COURSES EMPLOYEES EMPLOYMENT EMPLOYMENT HISTORY EMPLOYMENT PROGRAMMES EMPLOYMENT SERVICES ETHNIC GROUPS FAMILY BENEFITS FIELDS OF STUDY FULL TIME EMPLOYMENT FURNISHED ACCOMMODA... GENDER HAPPINESS HEADS OF HOUSEHOLD HEALTH HEALTH STATUS HIGHER EDUCATION HOME BASED WORK HOME OWNERSHIP HOURS OF WORK HOUSEHOLDS HOUSING HOUSING BENEFITS HOUSING TENURE INCOME INDUSTRIES JOB CHANGING JOB HUNTING JOB SEEKER S ALLOWANCE LANDLORDS LONGTERM UNEMPLOYMENT Labour and employment MANAGERS MARITAL STATUS MATERNITY BENEFITS NATIONAL IDENTITY NATIONALITY OCCUPATIONAL TRAINING OCCUPATIONS OLD AGE BENEFITS OVERTIME PART TIME COURSES PART TIME EMPLOYMENT PLACE OF BIRTH PLACE OF RESIDENCE PRIVATE SECTOR PUBLIC SECTOR QUALIFICATIONS RECREATIONAL EDUCATION RECRUITMENT REDUNDANCY REDUNDANCY PAY RELIGIOUS AFFILIATION RENTED ACCOMMODATION RESIDENTIAL MOBILITY SELF EMPLOYED SICK LEAVE SICK PAY SICKNESS AND DISABI... SMOKING SOCIAL HOUSING SOCIAL SECURITY BEN... SOCIO ECONOMIC STATUS STATE RETIREMENT PE... SUBSIDIARY EMPLOYMENT SUPERVISORS SUPERVISORY STATUS TEMPORARY EMPLOYMENT TERMINATION OF SERVICE TIED HOUSING TRAINING TRAINING COURSES UNEMPLOYED UNEMPLOYMENT UNEMPLOYMENT BENEFITS UNFURNISHED ACCOMMO... UNWAGED WORKERS WAGES WELL BEING SOCIETY WORKING CONDITIONS WORKPLACE vital statistics an...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.
Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.
Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.
https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">
You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.
Below is the code that I used to scrape the code from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">
Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.
As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting
--- Original source retains full ownership of the source dataset ---
This data package is comprised of three datasets all pertaining to two dominant palmetto species, Serenoa repens and Sabal etonia, at Archbold Biological Station in south-central Florida. The first dataset, palmetto_data, contains survival and growth data across multiple years, habitats and experimental treatments. The second dataset, seedlings_data, follows the fate of marked putative palmetto seedlings in the field to assess survivorship and growth. The final dataset, harvested_palmetto_data, contains size data and estimated dry mass (biomass in grams) of 33 destructively harvested palmetto plants (17 S. repens and 16 S. etonia) of varying sizes and across habitats. Thirty-two of these were used to calculate estimated biomass, using regression equations, for palmettos sampled in the palmetto_data. Below we summarize experimental setup and data collected for each dataset. Palmetto data Demographic data were collected as three separate components. The first component compared growth among habitats. Starting in 1981, equal numbers of both palmetto species were marked across scrubby flatwoods (oak scrub) and flatwoods habitats (3 sites per habitat) for a total of 240 marked plants. These habitats had not burned within the last decade, but historically had experienced a natural fire return interval of 5 - 20 years prior to this studies initiation. The second component added an additional 400 palmettos (200 of each species), which were marked in sand pine scrub (n = 200) in 1985 and sandhill habitat (n = 200) in 1989 on Archbold's Red Hill. At the time of this project's initiation, all Red Hill management units were last burned in 1927 and were considered long unburned. Part of Archbold's management plan included restoring fire into some management units while leaving others long unburned to serve as reference units. Therefore, for our second component, we were able to create a 2x2 factorial design using habitat types on Red Hill and fire management as factors, with 100 palmettos in each category (50 of each species). The third component involved an experiment to examine the factorial effects of clipping and fertilizing on palmetto flowering. We marked 300 palmettos (150 of each species), all in sand pine scrub habitat on Red Hill, and used the 100 palmettos marked in 1985 as controls. Annual data measures included height, canopy length and width (all in cm), number of new and green leaves and flowering scapes. Data were collected continuously (not for all variables or sites) from 1981 through 1997 then again in 2001 and 2017. Data collection is ongoing at 5-year intervals. Data on the 100 plants in the experimental sandhill on Red Hill were not collected in 2017 due to the removal of marked stakes from roller chopping of the site as part of more recent sandhill restoration efforts. A subset of the plants in the clipping and fertilizing experiment were lost in 2013 when a plow line was established to stop the spread of a wildfire. The locations of all remaining plants were taken in 2017 using a Trimble GPS unit and are included as a separate data file (palmetto_location_data) and shapefile (palmetto_shape). Seedling data In January 1989, we marked 100 putative seedlings in flatwoods habitats and 87 in scrubby flatwoods habitats. Putative seedlings typically cannot be identified using morphology as either S. repens or S. etonia so sample sizes of each are unknown. Annual data recorded included survival, standing height (cm) and maximum crown diameter (cm). In 1991, we started measuring basal stem diameter (cm) with calipers. During annual visits, we noted if the species could be identified as S. repens or S. etonia. Data were collected continuously starting in 1989 through 1997, then again in 2001 and 2008. Data collection is not ongoing for this dataset. Harvested Palmetto data Thirty-three palmettos, 17 S. repens and 16 S. etonia, were destructively harvested at three different sites, from two habitats (scrubby flatwoods and sand pine scrub) in 1985. Basic size measures as taken for palmetto demography data were recorded including height, canopy length and width (all in cm) and the number of green leaves. Additional data measures were recorded on the largest leaf blade including maximum length and width of the palmetto leaf and petiole length and width. Finally, basal diameter at the ground level was recorded. Only 32 palmettos were used to develop biomass regressions (17 S. repens and 15 S. etonia). Biomass is the estimated dry mass (g) of each harvested palmetto. Fresh palmettos were divided into leaf and stem (both above- and below-ground), but roots were not harvested since they grow to depths of several meters, making recovery of all root tissues virtually impossible for fresh-mass determination. Subsamples of fresh mass were oven dried at 80C to constant mass for estimation of dry mass equivalent, which in turn was used to estimate the dry mass of the harvested palmettos.
https://www.bco-dmo.org/dataset/701751/licensehttps://www.bco-dmo.org/dataset/701751/license
Demographic data for introduced crab from multiple bays along the Central California coast, shallow subtidal (<3 m depth), in 2015. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=We conducted monthly trappings of invasive European green crabs to gather demographic data from several bays in northern California: Bodega Harbor, Tomales Bay, Bolinas Lagoon, San Francisco Bay, and Elkhorn Slough. All sites were accessed by foot via shore entry. At each of four sites within each bay, we placed 5 baited traps (folding Fukui fish traps) and 5 baited minnow traps in shallow intertidal areas. Traps arrays were set with fish and minnow traps alternating and with each 20 m apart. Traps were retrieved 24 hours later and traps were rebaited and collected again the following day.\u00a0Trapping was continued for three consecutive days with traps removed on the final day.\u00a0Each day, data for crab species, size, sex, reproductive condition, and injuries were collected for all crabs in the field. Following data collection, all crabs were returned to the lab, and frozen overnight prior to disposal.\u00a0
See Turner et al. (2016)\u00a0Biological Invasions\u00a018: 533-548 for
additional methodological details:
Turner, B.C., de Rivera, C.E., Grosholz, E.D., & Ruiz, G.M. 2016. Assessing
population increase as a possible outcome to management of invasive species.
Biological Invasions, 18(2), pp 533\u2013548.
doi:10.1007/s10530-015-1026-9
awards_0_award_nid=699764
awards_0_award_number=OCE-1514893
awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1514893
awards_0_funder_name=NSF Division of Ocean Sciences
awards_0_funding_acronym=NSF OCE
awards_0_funding_source_nid=355
awards_0_program_manager=David L. Garrison
awards_0_program_manager_nid=50534
cdm_data_type=Other
comment=Demographic data for introduced crab from multiple bays in 2015
PI: Edwin Grosholz (UC Davis)
Co-PI: Catherine de Rivera & Gregory Ruiz (Portland State University)
Version: 15 June 2017
Conventions=COARDS, CF-1.6, ACDD-1.3
data_source=extract_data_as_tsv version 2.3 19 Dec 2019
defaultDataQuery=&time<now
doi=10.1575/1912/bco-dmo.701751.1
Easternmost_Easting=-121.738422
geospatial_lat_max=38.316968
geospatial_lat_min=36.823953
geospatial_lat_units=degrees_north
geospatial_lon_max=-121.738422
geospatial_lon_min=-123.058725
geospatial_lon_units=degrees_east
infoUrl=https://www.bco-dmo.org/dataset/701751
institution=BCO-DMO
instruments_0_dataset_instrument_description=At each of four sites within each bay, we placed 5 baited traps (folding Fukui fish traps) and 5 baited minnow traps in shallow intertidal areas.
instruments_0_dataset_instrument_nid=701774
instruments_0_description=Fukui produces multi-species, multi-purpose collapsible or stackable fish traps, available in different sizes.
instruments_0_instrument_name=Fukui fish trap
instruments_0_instrument_nid=701772
instruments_0_supplied_name=folding Fukui fish traps
metadata_source=https://www.bco-dmo.org/api/dataset/701751
Northernmost_Northing=38.316968
param_mapping={'701751': {'lat': 'master - latitude', 'lon': 'master - longitude'}}
parameter_source=https://www.bco-dmo.org/mapserver/dataset/701751/parameters
people_0_affiliation=University of California-Davis
people_0_affiliation_acronym=UC Davis
people_0_person_name=Edwin Grosholz
people_0_person_nid=699768
people_0_role=Principal Investigator
people_0_role_type=originator
people_1_affiliation=Portland State University
people_1_affiliation_acronym=PSU
people_1_person_name=Catherine de Rivera
people_1_person_nid=699771
people_1_role=Co-Principal Investigator
people_1_role_type=originator
people_2_affiliation=Portland State University
people_2_affiliation_acronym=PSU
people_2_person_name=Gregory Ruiz
people_2_person_nid=471603
people_2_role=Co-Principal Investigator
people_2_role_type=originator
people_3_affiliation=Woods Hole Oceanographic Institution
people_3_affiliation_acronym=WHOI BCO-DMO
people_3_person_name=Shannon Rauch
people_3_person_nid=51498
people_3_role=BCO-DMO Data Manager
people_3_role_type=related
project=Invasive_predator_harvest
projects_0_acronym=Invasive_predator_harvest
projects_0_description=The usual expectation is that when populations of plants and animals experience repeated losses to predators or human harvest, they would decline over time. If instead these populations rebound to numbers exceeding their initial levels, this would seem counter-intuitive or even paradoxical. However, for several decades mathematical models of population processes have shown that this unexpected response, formally known as overcompensation, is not only possible, but even expected under some circumstances. In what may be the first example of overcompensation in a marine system, a dramatic increase in a population of the non-native European green crab was recently observed following an intensive removal program. This RAPID project will use field surveys and laboratory experiments to verify that this population explosion results from overcompensation. Data will be fed into population models to understand to what degree populations processes such as cannibalism by adult crabs on juvenile crabs and changes in maturity rate of reproductive females are contributing to or modifying overcompensation. The work will provide important insights into the fundamental population dynamics that can produce overcompensation in both natural and managed populations. Broader Impacts include mentoring graduate trainees and undergraduate interns in the design and execution of field experiments as well as in laboratory culture and feeding experiments. The project will also involve a network of citizen scientists who are involved with restoration activities in this region and results will be posted on the European Green Crab Project website.
This project aims to establish the first example of overcompensation in marine systems. Overcompensation refers to the paradoxical process where reduction of a population due to natural or human causes results in a greater equilibrium population than before the reduction. A population explosion of green crabs has been recently documented in a coastal lagoon and there are strong indications that this may be the result of overcompensation. Accelerated maturation of females, which can accompany and modify the expression of overcompensation has been observed. This RAPID project will collect field data from this unusual recruitment class and conduct targeted mesocosm experiments. These will include population surveys and mark-recapture studies to measure demographic rates across study sites. Laboratory mesocosm studies using this recruitment class will determine size specific mortality. Outcomes will be used in population dynamics models to determine to what degree overcompensation has created this dramatic population increase. The project will seek answers to the following questions: 1) what are the rates of cannibalism by adult green crabs and large juveniles on different sizes of juvenile green crabs, 2) what are the consequences of smaller size at first reproduction for population dynamics and for overcompensation and 3) how quickly will the green crab population return to the levels observed prior to the eradication program five years earlier?
projects_0_end_date=2016-11
projects_0_geolocation=Europe
projects_0_name=RAPID: A rare opportunity to examine overcompensation resulting from intensive harvest of an introduced predator
projects_0_project_nid=699765
projects_0_start_date=2014-12
sourceUrl=(local files)
Southernmost_Northing=36.823953
standard_name_vocabulary=CF Standard Name Table v55
version=1
Westernmost_Easting=-123.058725
xml_source=osprey2erddap.update_xml() v1.3
This series has been discontinued.
Life expectancy at birth and age 65 by sex and ward, London borough, region, 1999/03 - 2010/14.
The population data used is revised 2002-2010 ONS mid year estimates (MYE) - revised post 2011 Census. Revised population estimates by single year of age for wards can also be found on the ONS website for 2002-2010, 2011, 2012, and 2013. These figures are consistent with the published revised mid-2002 to mid-2010 local authority estimates.
Rolling 5-year combined life expectancies are used for wards to reduce the effects of the variability in number of deaths in each year. The same method is applied to higher geographies to enable meaningful comparisons. However, 3-year combined expectancies are published separately on the Datastore for geographical areas that are local authority and above.
If the GLA publish revised 2002-2010 population data for wards then these life expectancy figures will also be revised to reflect them.
The ONS vital statistics mortality data breaks deaths into 10 year age bands. 5 year age band deaths were modelled using this data.
Vital Statistics: Population and Health Reference Tables are available on the ONS website http://www.ons.gov.uk/ons/rel/vsob1/vital-statistics--population-and-health-reference-tables/index.html">here.
The tool for calculating life expectancy is available from Public Health England.
The highest age band in the calculator is currently 85+. If the tool is updated with a higher upper age band (ie 90+), this data will be revised to reflect this change.
Healthy life expectancy and disability-free life expectancy (1999-2003) at birth have been calculated for wards in England and Wales. These can be found on the ONS website.
This data is also presented in the GLA ward profiles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
We are enclosing the database used in our research titled "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary", along with our statistical calculations. For the sake of reproducibility, further information can be found in the file Short_Description_of_Data_Analysis.pdf and Statistical_formulas.pdf
The sharing of data is part of our aim to strengthen the base of our scientific research. As of March 7, 2024, the detailed submission and analysis of our research findings to a scientific journal has not yet been completed.
The dataset was expanded on 23rd September 2024 to include SPSS statistical analysis data, a heatmap, and buffer zone analysis around the Health Development Offices (HDOs) created in QGIS software.
Short Description of Data Analysis and Attached Files (datasets):
Our research utilised data from 2022, serving as the basis for statistical standardisation. The 2022 Hungarian census provided an objective basis for our analysis, with age group data available at the county level from the Hungarian Central Statistical Office (KSH) website. The 2022 demographic data provided an accurate picture compared to the data available from the 2023 microcensus. The used calculation is based on our standardisation of the 2022 data. For xlsx files, we used MS Excel 2019 (version: 1808, build: 10406.20006) with the SOLVER add-in.
Hungarian Central Statistical Office served as the data source for population by age group, county, and regions: https://www.ksh.hu/stadat_files/nep/hu/nep0035.html, (accessed 04 Jan. 2024.) with data recorded in MS Excel in the Data_of_demography.xlsx file.
In 2022, 108 Health Development Offices (HDOs) were operational, and it's noteworthy that no developments have occurred in this area since 2022. The availability of these offices and the demographic data from the Central Statistical Office in Hungary are considered public interest data, freely usable for research purposes without requiring permission.
The contact details for the Health Development Offices were sourced from the following page (Hungarian National Population Centre (NNK)): https://www.nnk.gov.hu/index.php/efi (n=107). The Semmelweis University Health Development Centre was not listed by NNK, hence it was separately recorded as the 108th HDO. More information about the office can be found here: https://semmelweis.hu/egeszsegfejlesztes/en/ (n=1). (accessed 05 Dec. 2023.)
Geocoordinates were determined using Google Maps (N=108): https://www.google.com/maps. (accessed 02 Jan. 2024.) Recording of geocoordinates (latitude and longitude according to WGS 84 standard), address data (postal code, town name, street, and house number), and the name of each HDO was carried out in the: Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file.
The foundational software for geospatial modelling and display (QGIS 3.34), an open-source software, can be downloaded from:
https://qgis.org/en/site/forusers/download.html. (accessed 04 Jan. 2024.)
The HDOs_GeoCoordinates.gpkg QGIS project file contains Hungary's administrative map and the recorded addresses of the HDOs from the
Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file,
imported via .csv file.
The OpenStreetMap tileset is directly accessible from www.openstreetmap.org in QGIS. (accessed 04 Jan. 2024.)
The Hungarian county administrative boundaries were downloaded from the following website: https://data2.openstreetmap.hu/hatarok/index.php?admin=6 (accessed 04 Jan. 2024.)
HDO_Buffers.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding buffer zones with a radius of 7.5 km.
Heatmap.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding heatmap (Kernel Density Estimation).
A brief description of the statistical formulas applied is included in the Statistical_formulas.pdf.
Recording of our base data for statistical concentration and diversification measurement was done using MS Excel 2019 (version: 1808, build: 10406.20006) in .xlsx format.
Using the SPSS 29.0.1.0 program, we performed the following statistical calculations with the databases Data_HDOs_population_without_outliers.sav and Data_HDOs_population.sav:
For easier readability, the files have been provided in both SPV and PDF formats.
The translation of these supplementary files into English was completed on 23rd Sept. 2024.
If you have any further questions regarding the dataset, please contact the corresponding author: domjan.peter@phd.semmelweis.hu
http://dcat-ap.ch/vocabulary/licenses/terms_byhttp://dcat-ap.ch/vocabulary/licenses/terms_by
Overview The SUNWELL Modelling Environment is a combination of data and code that models electricity production from satellite-derived irradiance data and other spatial data sets for all of Switzerland. This ensemble accompanies the publication "The bright side of PV production in snow-covered mountains", published in the Proceedings of the National Academy of Science and reproduces all results and figures of. Code and resources are in their original form (with documentation). A new version with a more generalized application to PV modelling and with more flexibility in terms of input and output formats will be released in the coming months.
Format All code is written and has to be executed in Matlab. The input and output data sets are also in the Matlab-specific .mat format. Whenever publicly available, the original data is provided as geotif, .xlsx or other common format. This is the case for:
Measured irradiance for two validation sites (/Validation/ASRB) The ‘Metadata’ documents in the respective folders provide further information about the data sources and processing. Figures are produced either in .pdf or .png format.
Structure The central level of the SUNWELL environment holds the 5 Mains, which run the different modelling aspects of the paper; each code is documented separately. Additional code is located in the ‘DataProcessing’ and the ‘functions’ folder. Functions are called in the different Mains.
‘InputsFromMatlab’ contains the radiation and albedo input data sets in separate subfolders (SIS/SISDIR/ALB). The original data is not publicly available, but can be requested for research purposes free of charge. We provide a processed subset of the data set that was used to run the SUNWELL simulations. The MSG subfolder contains additional spatial input data sets.
‘Outputs’ contains the output files from the different mains (matching names, Main_CHallpixels.m Prod_CHallpixels)
‘Publication_figures’ contains all individual figures from the PNAS publication, as well as the generating code (/code_plot) and the power point figures (/ppts) that provide the combined final figures.
‘Validation’ contains the data sets used in the model validation:
Electricity production from a validation site at Lac des Toules in Wallis (/LDT), this data set was provided under an NDA and cannot be made publicly available.
Paper Citation:
Annelen Kahl; Jérôme Dujardin; Michael Lehning (2018). Dataset on PV Production in Snow Covered Mountains. PNAS - Proceedings of the National Academy of Sciences. (in press)
Cause of death data based on VA interviews were contributed by fourteen INDEPTH HDSS sites in sub-Saharan Africa and eight sites in Asia. The principles of the Network and its constituent population surveillance sites have been described elsewhere [1]. Each HDSS site is committed to long-term longitudinal surveillance of circumscribed populations, typically each covering around 50,000 to 100,000 people. Households are registered and visited regularly by lay field-workers, with a frequency varying from once per year to several times per year. All vital events are registered at each such visit, and any deaths recorded are followed up with verbal autopsy interviews, usually 147 undertaken by specially trained lay interviewers. A few sites were already operational in the 1990s, but in this dataset 95% of the person-time observed related to the period from 2000 onwards, with 58% from 2007 onwards. Two sites, in Nairobi and Ouagadougou, followed urban populations, while the remainder covered areas that were generally more rural in character, although some included local urban centres. Sites covered entire populations, although the Karonga, Malawi, site only contributed VAs for deaths of people aged 12 years and older. Because the sites were not located or designed in a systematic way to be representative of national or regional populations, it is not meaningful to aggregate results over sites.
All cause of death assignments in this dataset were made using the InterVA-4 model version 4.02 [2]. InterVA-4 uses probabilistic modelling to arrive at likely cause(s) of death for each VA case, the workings of the model being based on a combination of expert medical opinion and relevant available data. InterVA-4 is the only model currently available that processes VA data according to the WHO 2012 standard and categorises causes of death according to ICD-10. Since the VA data reported here were collected before the WHO 2012 standard was formulated, they were all retrospectively transformed into the WHO 2012 and InterVA-4 input format for processing.
The InterVA-4 model was applied to the data from each site, yielding, for each case, up to three possible causes of death or an indeterminate result. Each cause for a case is a single record in the dataset. In a minority of cases, for example where symptoms were vague, contradictory or mutually inconsistent, it was impossible for InterVA-4 to determine a cause of death, and these deaths were attributed as entirely indeterminate. For the remaining cases, one to three likely causes and their likelihoods were assigned by InterVA-4, and if the sum of their likelihoods was less than one, the residual component was then assigned as being indeterminate. This was an important process for capturing uncertainty in cause of death outcome(s) from the model at the individual level, thus avoiding over-interpretation of specific causes. As a consequence there were three sources of unattributed cause of death: deaths registered for which VAs were not successfully completed; VAs completed but where the cause was entirely indeterminate; and residual components of deaths attributed as indeterminate.
In this dataset each case has between one and four records, each with its own cause and likelihood. Cases for which VAs were not successfully completed has a single record with the cause of death recorded as “VA not completed” and a likelihood of one. Thus the overall sum of the likelihoods equated to the total number of deaths. Each record also contains a population weighting factor reflecting the ratio of the population fraction for its site, age group, sex and year to the corresponding age group and sex fraction in the standard population (see section on weighting).
In this context, all of these data are secondary datasets derived from primary data collected separately by each participating site. In all cases the primary data collection was covered by site-level ethical approvals relating to on-going demographic surveillance in those specific locations. No individual identity or household location data are included in this secondary data.
Sankoh O, Byass P. The INDEPTH Network: filling vital gaps in global epidemiology. International Journal of Epidemiology 2012; 41:579-588.
Byass P, Chandramohan D, Clark SJ, D’Ambruoso L, Fottrell E, Graham WJ, et al. Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool. Global Health Action 2012; 5:19281.
Demographic surveiallance areas (countries from Africa, Asia and Oceania) of the following HDSSs:
Code Country INDEPTH Centre
BD011 Bangladesh ICDDR-B : Matlab
BD012 Bangladesh ICDDR-B : Bandarban
BD013 Bangladesh ICDDR-B : Chakaria
BD014 Bangladesh ICDDR-B : AMK BF031 Burkina Faso Nouna BF041 Burkina Faso Ouagadougou
CI011 Côte d'Ivoire Taabo ET031 Ethiopia Kilite Awlaelo
GH011 Ghana Navrongo
GH031 Ghana Dodowa
GM011 The Gambia Farafenni ID011 Indonesia Purworejo IN011 India Ballabgarh
IN021 India Vadu
KE011 Kenya Kilifi
KE021 Kenya Kisumu
KE031 Kenya Nairobi
MW011 Malawi Karonga
SN011 Senegal IRD : Bandafassi VN012 Vietnam Hanoi Medical University : Filabavi
ZA011 South Africa Agincourt ZA031 South Africa Africa Centre
Death Cause
Surveillance population Deceased individuals Cause of death
Verbal autopsy-based cause of death data
Rounds per year varies between sites from once to three times per year
No sampling, covers total population in demographic surveillance area
Face-to-face [f2f]
The Verbal Autopsy Questionnaires used by the various sites differed, but in most cases they were a derivation from the original WHO Verbal Autopsy questionnaire.
http://www.who.int/healthinfo/statistics/verbalautopsystandards/en/index1.html
One cause of death record was inserted for every death where a verbal autopsy was not conducted. The cuase of death assigned in these cases is "XX VA not completed"
This layer serves as the authoritative geographic data source for all school district area boundaries in California. School districts are single purpose governmental units that operate schools and provide public educational services to residents within geographically defined areas. Agencies considered school districts that do not use geographically defined service areas to determine enrollment are excluded from this data set. In order to view districts represented as point locations, please see the "California School District Offices" layer. The school districts in this layer are enriched with additional district-level attribute information from the California Department of Education's data collections. These data elements add meaningful statistical and descriptive information that can be visualized and analyzed on a map and used to advance education research or inform decision making.School districts are categorized as either elementary (primary), high (secondary) or unified based on the general grade range of the schools operated by the district. Elementary school districts provide education to the lower grade/age levels and the high school districts provide education to the upper grade/age levels while unified school districts provide education to all grade/age levels in their service areas. Boundaries for the elementary, high and unified school district layers are combined into a single file. The resulting composite layer includes areas of overlapping boundaries since elementary and high school districts each serve a different grade range of students within the same territory. The 'DistrictType' field can be used to filter and display districts separately by type.Boundary lines are maintained by the California Department of Education (CDE) and are effective in the 2023-24 academic year . The CDE works collaboratively with the US Census Bureau to update and maintain boundary information as part of the federal School District Review Program (SDRP). The Census Bureau uses these school district boundaries to develop annual estimates of children in poverty to help the U.S. Department of Education determine the annual allocation of Title I funding to states and school districts. The National Center for Education Statistics (NCES) also uses the school district boundaries to develop a broad collection of district-level demographic estimates from the Census Bureau’s American Community Survey (ACS).The school district enrollment and demographic information are based on student enrollment counts collected on Fall Census Day (first Wednesday in October) in the 2023-24 academic year. These data elements are collected by the CDE through the California Longitudinal Achievement System (CALPADS) and can be accessed as publicly downloadable files from the Data & Statistics web page on the CDE website https://www.cde.ca.gov/ds.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Federal Superfund sites are some of the most polluted in the United States. This dataset contains a multifaceted view of Superfunds, including free-form text descriptions, geography, demographics and socioeconomics.
The core data was scraped from the National Priorities List (NPL) provided by the U.S. Environmental Protection Agency (EPA). This table provides basic information such as site name, site score, date added, and links to a site description and current status. Apache Tika was used to extract text from the site description pdfs. The addresses were scraped from site status pages, and used to geocode to latitude and longitude and Census block group. The block group assignment was used to join with the Census Bureau's planning database, a rich source of nationwide demographic and socioeconomic data. The full source code used to generate the data can be found here, on github.
I have provided three separate downloads to explore:
Some caveats:
I would like to thank the EPA and the Census Bureau for making such detailed information publicly available. For relevant academic work, please see Burwell-Naney et al. (2013) and references, both to and therein.
Please let me know if you have any suggestions for improving the dataset!
Abstract copyright UK Data Service and data collection copyright owner.The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at the local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS), all its associated LFS boosts and the APS boost. The APS aims to provide enhanced annual data for England, covering a target sample of at least 510 economically active persons for each Unitary Authority (UA)/Local Authority District (LAD) and at least 450 in each Greater London Borough. In combination with local LFS boost samples, the survey provides estimates for a range of indicators down to Local Education Authority (LEA) level across the United Kingdom.For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation. For variable and value labelling and coding frames that are not included either in the data or in the current APS documentation, users are advised to consult the latest versions of the LFS User Guides, which are available from the ONS Labour Force Survey - User Guidance webpages.Occupation data for 2021 and 2022The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. The affected datasets have now been updated. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022APS Well-Being DatasetsFrom 2012-2015, the ONS published separate APS datasets aimed at providing initial estimates of subjective well-being, based on the Integrated Household Survey. In 2015 these were discontinued. A separate set of well-being variables and a corresponding weighting variable have been added to the April-March APS person datasets from A11M12 onwards. Further information on the transition can be found in the Personal well-being in the UK: 2015 to 2016 article on the ONS website.APS disability variablesOver time, there have been some updates to disability variables in the APS. An article explaining the quality assurance investigations on these variables that have been conducted so far is available on the ONS Methodology webpage. End User Licence and Secure Access APS dataUsers should note that there are two versions of each APS dataset. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. The EUL version includes Government Office Region geography, banded age, 3-digit SOC and industry sector for main, second and last job. The Secure Access version contains more detailed variables relating to: age: single year of age, year and month of birth, age completed full-time education and age obtained highest qualification, age of oldest dependent child and age of youngest dependent child family unit and household: including a number of variables concerning the number of dependent children in the family according to their ages, relationship to head of household and relationship to head of family nationality and country of origin geography: including county, unitary/local authority, place of work, Nomenclature of Territorial Units for Statistics 2 (NUTS2) and NUTS3 regions, and whether lives and works in same local authority district health: including main health problem, and current and past health problems education and apprenticeship: including numbers and subjects of various qualifications and variables concerning apprenticeships industry: including industry, industry class and industry group for main, second and last job, and industry made redundant from occupation: including 4-digit Standard Occupational Classification (SOC) for main, second and last job and job made redundant from system variables: including week number when interview took place and number of households at address The Secure Access data have more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. Latest edition informationFor the second edition (October 2024), smoking variables CIGEVER, CIGNOW and CIGSMK16 have been added to the data file. Main Topics:Topics covered include: household composition and relationships, housing tenure, nationality, ethnicity and residential history, employment and training (including government schemes), workplace and location, job hunting, educational background and qualifications. Many of the variables included in the survey are the same as those in the LFS. Multi-stage stratified random sample Face-to-face interview Telephone interview 2023 ADULT EDUCATION AGE ANXIETY APPLICATION FOR EMP... APPOINTMENT TO JOB ATTITUDES BONUS PAYMENTS BUSINESSES CARE OF DEPENDANTS CHRONIC ILLNESS COHABITATION CONDITIONS OF EMPLO... COVID 19 DEBILITATIVE ILLNESS DEGREES DISABILITIES Demography population ECONOMIC ACTIVITY EDUCATIONAL BACKGROUND EDUCATIONAL COURSES EMPLOYEES EMPLOYER SPONSORED ... EMPLOYMENT EMPLOYMENT HISTORY EMPLOYMENT PROGRAMMES ETHNIC GROUPS FAMILIES FAMILY BENEFITS FIELDS OF STUDY FULL TIME EMPLOYMENT FURNISHED ACCOMMODA... FURTHER EDUCATION GENDER HAPPINESS HEADS OF HOUSEHOLD HEALTH HIGHER EDUCATION HOME OWNERSHIP HOURS OF WORK HOUSEHOLDS HOUSING HOUSING BENEFITS HOUSING TENURE INCOME INDUSTRIES JOB CHANGING JOB HUNTING JOB SEEKER S ALLOWANCE LANDLORDS Labour and employment MANAGERS MARITAL STATUS NATIONAL IDENTITY NATIONALITY OCCUPATIONS OVERTIME PART TIME COURSES PART TIME EMPLOYMENT PLACE OF BIRTH PLACE OF RESIDENCE PRIVATE SECTOR PUBLIC SECTOR RECRUITMENT REDUNDANCY REDUNDANCY PAY RELIGIOUS AFFILIATION RENTED ACCOMMODATION RESIDENTIAL MOBILITY SELF EMPLOYED SICK LEAVE SICKNESS AND DISABI... SOCIAL HOUSING SOCIAL SECURITY BEN... SOCIO ECONOMIC STATUS STATE RETIREMENT PE... STUDENTS SUBSIDIARY EMPLOYMENT SUPERVISORS SUPERVISORY STATUS TAX RELIEF TEMPORARY EMPLOYMENT TERMINATION OF SERVICE TIED HOUSING TRAINING TRAINING COURSES TRAVELLING TIME UNEMPLOYED UNEMPLOYMENT UNEMPLOYMENT BENEFITS UNFURNISHED ACCOMMO... UNWAGED WORKERS WAGES WELL BEING HEALTH WELSH LANGUAGE WORKING CONDITIONS WORKPLACE vital statistics an...
In cooperative breeding systems, inclusive fitness theory predicts that non-breeding helpers more closely related to the breeders should be more willing to provide costly alloparental care, and thus have more impact on breeder fitness. In the red-cockaded woodpecker (Dryobates borealis), most helpers are the breeders’ earlier offspring, but helpers do vary within groups in both relatedness to the breeders (some even being unrelated) and sex, and it can be difficult to parse their separate impacts on breeder fitness. Moreover, most support for inclusive fitness theory has been positive associations between relatedness and behavior, rather than actual fitness consequences. We used functional linear models to evaluate the per capita effects of helpers of different relatedness on eight breeder fitness components measured for up to 41 years at three sites. In support of inclusive fitness theory, helpers more related to the breeding pair made greater contributions to six fitness components. H..., We used long-term demographic monitoring data collected over 28 to 41 consecutive years at three sites: the Sandhills region in south-central North Carolina (1980–2020), Marine Corps Base Camp Lejeune on the central coast of North Carolina (1986–2020), and Eglin Air Force Base in the western panhandle of Florida (1993–2020). Monitoring methods are described in detail by Walters et al. (1988) (see also Appendix A for more details on monitoring). See Walters and Garcia (2016) for how individuals are assigned breeder and helper status., You will need both R and RStudio to use the dataset (and corresponding code). , Manuscript citation: Kerr, William, and Walters (2023) Inclusive fitness may explain some but not all benefits derived from social behavior in a cooperatively breeding bird. American Naturalist.
Archive citation: Kerr, Natalie; Morris, William; Walters, Jeffrey (Forthcoming 2023). Demographic data on the red-cockaded woodpecker [Dataset]. Dryad. https://doi.org/10.5061/dryad.3bk3j9kqs
Affiliated authors: Natalie Z. Kerr, William F. Morris, and Jeffrey R. Walters
Corresponding author details:
To run the code file ("Kerr-et-al_FLMs.rmd"), you will need to install R and RStudio, as well as install the packages (using install.packages()) listed below and at the beginning of the RMarkdown file.
List of packages and their versions
Note that we used these versions of these two packages for Kerr et al. 2023.
The RMarkdown file ...
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
This dataset is the definitive of the annually released meshblock boundaries as at 1 January 2024 as defined by Stats NZ. This version contains 57,539 meshblocks, including 16 with empty or null geometries (non-digitised meshblocks).
Stats NZ maintains an annual meshblock pattern for collecting and producing statistical data. This allows data to be compared over time.
A meshblock is the smallest geographic unit for which statistical data is collected and processed by Stats NZ. A meshblock is a defined geographic area, which can vary in size from part of a city block to a large area of rural land. The optimal size for a meshblock is 30–60 dwellings (containing approximately 60–120 residents).
Each meshblock borders on another to form a network covering all of New Zealand, including coasts and inlets and extending out to the 200-mile economic zone (EEZ) and is digitised to the 12-mile (19.3km) limit. Meshblocks are added together to build up larger geographic areas such as statistical area 1 (SA1), statistical area 2 (SA2), statistical area 3 (SA3), and urban rural (UR). They are also used to define electoral districts, territorial authorities, and regional councils.
Meshblock boundaries generally follow road centrelines, cadastral property boundaries, or topographical features such as rivers. Expanses of water in the form of lakes and inlets are defined separately from land.
Meshblock maintenance
Meshblock boundaries are amended by:
Reasons for meshblock splits and nudges can include:
· to maintain meshblock criteria rules.
· to improve the size balance of meshblocks in areas where there has been population growth
· to maintain alignment to cadastre and other geographic features.
· Stats NZ requests for boundary changes so that statistical geography boundaries can be moved
· external requests for boundary changes so that administrative or electoral boundaries can be moved
· to separate land and water. Mainland, inland water, islands, inlets, and oceanic are defined separately
Meshblock changes are made throughout the year. A major release is made at 1 January each year with ad hoc releases available to users at other times.
While meshblock boundaries are continually under review, 'freezes' on changes to the boundaries are applied periodically. Such 'freezes' are imposed at the time of population censuses and during periods of intense electoral activity, for example, prior and during general and local body elections.
Meshblock numbering
Meshblocks are not named and have seven-digit codes.
When meshblocks are split, each new meshblock is given a new code. The original meshblock codes no longer exist within that version and future versions of the meshblock classification. Meshblock codes do not change when a meshblock boundary is nudged.
Meshblocks that existed prior to 2015 and have not changed are numbered from 0000100 to 3210003. Meshblocks created from 2015 onwards are numbered from 4000000.
Digitised and non-digitised meshblocks
The digital geographic boundaries are defined and maintained by Stats NZ.
Meshblocks cover the land area of New Zealand, the water area to the 12mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, offshore oil rigs, and Ross Dependency. The following 16 meshblocks are not held in digitised form.
Meshblock / Location (statistical area 2 name)
For more information please refer to the Statistical standard for geographic areas 2023.
High definition version
This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre.
Digital Data
Digital boundary data became freely available on 1 July 2007.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set is a collection of estimated daily mean and maximum values for a range of air quality and meterological measurements and model forecasts for the UK and crown dependencies postcode districts (e.g. 'AB') for the years 2016-2019, inclusive.
The paper describing this dataset is available here: https://www.nature.com/articles/s41597-022-01135-6
The data uses a 'concentric regions' method to estimate the measurement for all regions, as follows. If measurements exist within the region, the mean of those measurements is used, if not, then a ring of neighbouring postcode regions are selected, and the mean of their measurement values used. If no measurement sites/data are found in the first ring, the process continues, taking the next ring of postcode district regions, working outwards until one or more sensors are found in a ring. As well as the measurement estimations, the number of rings required to find site data and make the estimations is also published. As a result, please note that estimations with higher ring counts ('rings') are likely to be calculated from more distant sensors. This distance depends upon the size of the postcode regions surrounding the location being estimated. Please use the ring count ('rings') to limit/filter estimations based on your required level of confidence.
The meteorological, pollen and air quality measurement data used to make the regional estimations can be found at this Zenodo archive. The data there contains Temperature, Relative Humidity, and Pressure data, downloaded from the Met Office MIDAS archives via the MEDMI server (https://www.data-mashup.org.uk/). Also downloaded from the MEDMI server are daily pollen measurements for the UK. PM10, PM2.5, NO2, NOx (as NO2), O3, and SO2 measurements from the DEFRA AURN network, and also model forecasts of the same made using the EMEP model.
The code used to make the estimations is available at this Zenodo archive.
The postcode data in postcode_district_data.csv are collated from several sources:
https://www.doogal.co.uk/UKPostcodes.php (population figures for the UK (UK Census 2011))
https://www.freemaptools.com/download-uk-postcode-outcode-boundaries.htm (postcode boundary polygons for UK and crown dependancies)
https://www.gov.gg/population (Guernsey (GY) population data for end June 2020)
https://www.gov.je/Government/JerseyInFigures/Population/Pages/Population.aspx (Jersey (JE) population data for end 2019)
https://www.gov.im/media/1369690/isle-of-man-in-numbers-july-2020.pdf (Isle of Man (IM) population data for April 2016)
The data-set is presented in CSV format, as six files:
postcode_district_data.csv: location metadata (region_id, geometry, description, population, country)
regional_site_counts.csv: a table showing the number of sites for each measurement (columns), for each region_id (rows). region_id's match those in the postcode_district_data.csv file.
turing_regional_estimates_aq_daily_met_pollen_pollution_imputed_data.csv: uses imputed site data (timestamp, region_id, ...[measurement name, rings]) ('rings' is the number of rings required to make the estimation)
turing_regional_estimates_aq_daily_met_pollen_pollution_original_data.csv: uses original site data (timestamp, region_id, ...[measurement name, rings]) ('rings' is the number of rings required to make the estimation)
turing_regional_estimates_aq_loc_type_daily_imputed_data.csv: uses imputed site data. Air quality regional estimates are calculated using specific AQ site location types* separately. (To prevent, for example, 'Traffic Urban' type sites being used to estimate 'non-traffic' or rural regions.)
turing_regional_estimates_aq_loc_type_daily_original_data.csv: uses original data. Air quality regional estimates are calculated using specific AQ site location types* separately. (To prevent, for example, 'Traffic Urban' type sites being used to estimate 'non-traffic' or rural regions.)
Industrial: comprises 'urban industrial' (9 sites) and suburban industrial (2 sites)
'Rural background' (14 sites)
'Urban background' (48 sites)
'Urban traffic' (47 sites)
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Environmental volunteering can benefit participants and nature through improving physical and mental wellbeing while encouraging environmental stewardship. To enhance achievement of these outcomes, conservation organisations need to reach different groups of people to increase participation in environmental volunteering. This paper explores what engages communities searching online for environmental volunteering.
We conducted a literature review of 1032 papers to determine key factors fostering participation by existing volunteers in environmental projects. We found the most important factor was to tailor projects to the motivations of participants. Also important were: promoting projects to people with relevant interests; meeting the perceived benefits of volunteers and removing barriers to participation.
We then assessed the composition and factors fostering participation of the NatureVolunteers’s online community (n = 2216) of potential environmental volunteers and compared findings with those from the literature review. We asked whether projects advertised by conservation organisations meet motivations and interests of this online community.
Using Facebook insights and Google Analytics we found that the online community were on average younger than extant communities observed in studies of environmental volunteering. Their motivations were also different as they were more interested in physical activity and using skills and less in social factors. They also exhibited preference for projects which are outdoor based, and which offer close contact with wildlife. Finally, we found that the online community showed a stronger preference for habitat improvement projects over those involving species-survey based citizen science.
Our results demonstrate mis-matches between what our online community are looking for and what is advertised by conservation organisations. The online community are looking for projects which are more solitary, more physically active and more accessible by organised transport. We discuss how our results may be used by conservation organisations to better engage with more people searching for environmental volunteering opportunities online.
We conclude that there is a pool of young people attracted to environmental volunteering projects whose interests are different to those of current volunteers. If conservation organisations can develop projects that meet these interests, they can engage larger and more diverse communities in nature volunteering.
Methods The data set consists of separate sheets for each set of results presented in the paper. Each sheet contains the full data, summary descriptive statistics analysis and graphs presented in the paper. The method for collection and processing of the dataset in each sheet is as follows:
The data set for results presented in Figure 1 in the paper - Sheet: "Literature"
We conducted a review of literature on improving participation within nature conservation projects. This enabled us to determine what the most important factors were for participating in environmental projects, the composition of the populations sampled and the methods by which data were collected. The search terms used were (Environment* OR nature OR conservation) AND (Volunteer* OR “citizen science”) AND (Recruit* OR participat* OR retain* OR interest*). We reviewed all articles identified in the Web of Science database and the first 50 articles sorted for relevance in Google Scholar on the 22nd October 2019. Articles were first reviewed by title, secondly by abstract and thirdly by full text. They were retained or excluded according to criteria agreed by the authors of this paper. These criteria were as follows - that the paper topic was volunteering in the environment, including citizen science, community-based projects and conservation abroad, and included the study of factors which could improve participation in projects. Papers were excluded for topics irrelevant to this study, the most frequent being the outcomes of volunteering for participants (such as behavioural change and knowledge gain), improving citizen science data and the usefulness of citizen science data. The remaining final set of selected papers was then read to extract information on the factors influencing participation, the population sampled and the data collection methods. In total 1032 papers were reviewed of which 31 comprised the final selected set read in full. Four factors were identified in these papers which improve volunteer recruitment and retention. These were: tailoring projects to the motivations of participants, promoting projects to people with relevant hobbies and interests, meeting the perceived benefits of volunteers and removing barriers to participation.
The data set for results presented in Figure 2 and Figure 3 in the paper - Sheet "Demographics"
To determine if the motivations and interests expressed by volunteers in literature were representative of wider society, NatureVolunteers was exhibited at three UK public engagement events during May and June 2019; Hullabaloo Festival (Isle of Wight), The Great Wildlife Exploration (Bournemouth) and Festival of Nature (Bristol). This allowed us to engage with people who may not have ordinarily considered volunteering and encourage people to use the website. A combination of surveys and semi-structured interviews were used to collect information from the public regarding demographics and volunteering. In line with our ethics approval, no personal data were collected that could identify individuals and all participants gave informed consent for their anonymous information to be used for research purposes. The semi-structured interviews consisted of conducting the survey in a conversation with the respondent, rather than the respondent filling in the questionnaire privately and responses were recorded immediately by the interviewer. Hullabaloo Festival was a free discovery and exploration event where NatureVolunteers had a small display and surveys available. The Great Wildlife Exploration was a Bioblitz designed to highlight the importance of urban greenspaces where we had a stall with wildlife crafts promoting NatureVolunteers. The Festival of Nature was the UK’s largest nature-based festival in 2019 where we again had wildlife crafts available promoting NatureVolunteers. The surveys conducted at these events sampled a population of people who already expressed an interest in nature and the environment by attending the events and visiting the NatureVolunteers stand. In total 100 completed surveys were received from the events NatureVolunteers exhibited at; 21 from Hullabaloo Festival, 25 from the Great Wildlife Exploration and 54 from the Festival of Nature. At Hullabaloo Festival information on gender was not recorded for all responses and was consequently entered as “unrecorded”.
OVERALL DESCRIPTION OF METHOD DATA COLLECTION FOR ALL OTHER RESULTS (Figures 4-7 and Tables 1-2)
The remaining data were all collected from the NatureVolunteers website. The NatureVolunteers website https://www.naturevolunteers.uk/ was set up in 2018 with funding support from the Higher Education Innovation Fund to expand the range of people accessing nature volunteering opportunities in the UK. It is designed to particularly appeal to people who are new to nature volunteering including young adults wishing to expand their horizons, families looking for ways connect with nature to enhance well-being and older people wishing to share their time and life experiences to help nature. In addition, it was designed to be helpful to professionals working in the countryside & wildlife conservation sectors who wish to enhance their skills through volunteering. As part of the website’s development we created and used an online project database, www.naturevolunteers.uk (hereafter referred to as NatureVolunteers), to assess the needs and interests of our online community. Our research work was granted ethical approval by the Bournemouth University Ethics Committee. The website collects entirely anonymous data on our online community of website users that enables us to evaluate what sort of projects and project attributes most appeal to our online community. Visitors using the website to find projects are informed as part of the guidance on using the search function that this fully anonymous information is collected by the website to enhance and share research understanding of how conservation organisations can tailor their future projects to better match the interests of potential volunteers. Our online community was built up over the 2018-2019 through open advertising of the website nationally through the social media channels of our partner conservation organisations, through a range of public engagement in science events and nature-based festivals across southern England and through our extended network of friends and families, their own social media networks and the NatureVolunteers website’s own social network on Facebook and Twitter. There were 2216 searches for projects on NatureVolunteers from January 1st to October 25th, 2019.
The data set for results presented in Figure 2 and Figure 3 in the paper - Sheet "Demographics"
On the website, users searching for projects were firstly asked to specify their expectations of projects. These expectations encompass the benefits of volunteering by asking whether the project includes social interaction, whether particular skills are required or can be developed, and whether physical activity is involved. The barriers to participation are incorporated by asking whether the project is suitable for families, and whether organised transport is provided. Users were asked to rate the importance of the five project expectations on a Likert scale of 1 to 5 (Not at all = 1, Not really = 2, Neutral = 3, It
This publication provides separate monthly reports on NHS-funded maternity services in England for September and October 2015. This is the latest release from the new Maternity Services Data Set (MSDS) and will be published on a monthly basis.
The MSDS is a patient-level data set that captures key information at each stage of the maternity service care pathway in NHS-funded maternity services, such as those maternity services provided by GP practices and hospitals. The data collected includes mother’s demographics, booking appointments, admissions and re-admissions, screening tests, labour and delivery along with baby’s demographics, diagnoses and screening tests.
The MSDS has been developed to help achieve better outcomes of care for mothers, babies and children. As a ‘secondary uses’ data set, it re-uses clinical and operational data for purposes other than direct patient care, such as commissioning, clinical audit, research, service planning and performance management at both local and national level. It will provide comparative, mother and child-centric data that will be used to improve clinical quality and service efficiency, and to commission services in a way that improves health and reduces inequalities.
These statistics are classified as experimental and should be used with caution. Experimental statistics are new official statistics undergoing evaluation. They are published in order to involve users and stakeholders in their development and as a means to build in quality at an early stage. More information about experimental statistics can be found on the UK Statistics Authority website.
This report contains key information based on the submissions that have been made by providers and will focus on data relating to activity that occurred in September 2015.
This report contains key information based on the submissions that have been made by providers and will focus on data relating to activity that occurred in October 2015.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
------------------------------------------------------------------------------------------------------Description of the dataset "Supplementary Data 3 - Study sites.csv"-------------------------------------------------------------------------------------------------------- The dataset - is used in the paper "Unexpected diversity in socially synchronized rhythms of shorebirds" Nature 2016 by M. Bulla et al - contains estimates of mean female and male wing length for each population of biparental shorebirds from a specific study site, plus the locations of the study site, whether the locations had tide, and whether the tide was used by the population for foraging, and how the incubation was monitored.-------------------------------------------------------------------------------------------------------- Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)-------------------------------------------------------------------------------------------------------- Values are separated by comma. -------------------------------------------------------------------------------------------------------- 1. scinam : scientific name of the species 2. sp : four letter abbreviation of the species's English name 3. study_site : name of the study site 4. site_abbreviation : four letter abbreviation of the study site 5. type : was the study site at the breeding ground (breeding) or not (wintering) 6. lat : latitude of the study site (decimal) 7. lon : longitude of the study site (decimal) 8. tidal_habitat : is the study site at primarily tidal habitat (y=yes, n=no) 9. tidal_used : if the study site is at primarily tidal habitat, do the birds use it for foraging (y=yes, n=no)10. incubation_monitoring : method used to monitor incubation (for details see the paper's Extended Data Table 4)11. sexing_method : identifies the method used to sex individuals to estimate the mean female and male wing length12. pop_wing_f : mean female wing length for the population13. f_wing_N : sample size used for the female mean estimate14. pop_wing_m : mean male wing length for the population15. m_wing_N : sample size used for the male mean estimate16. data_source : is the mean wing estimate based on the primary data ("our primary data") or literature (citation))--------------------------------------------------------------------------------------------------------WHEN USING THIS DATA, PLEASE CITE:Bulla et al (2016). Supplementary Data 3 - Study sites: location, population wing length, monitoring method, tide. figshare. https://doi.org/10.6084/m9.figshare.1536260. Retrieved ADD DATETIME.--------------------------------------------------------------------------------------------------------
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Note: After May 3, 2024, this dataset will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, hospital capacity, or occupancy data to HHS through CDC’s National Healthcare Safety Network (NHSN). The related CDC COVID Data Tracker site was revised or retired on May 10, 2023.
Note: May 3,2024: Due to incomplete or missing hospital data received for the April 21,2024 through April 27, 2024 reporting period, the COVID-19 Hospital Admissions Level could not be calculated for CNMI and will be reported as “NA” or “Not Available” in the COVID-19 Hospital Admissions Level data released on May 3, 2024.
This dataset represents COVID-19 hospitalization data and metrics aggregated to county or county-equivalent, for all counties or county-equivalents (including territories) in the United States. COVID-19 hospitalization data are reported to CDC’s National Healthcare Safety Network, which monitors national and local trends in healthcare system stress, capacity, and community disease levels for approximately 6,000 hospitals in the United States. Data reported by hospitals to NHSN and included in this dataset represent aggregated counts and include metrics capturing information specific to COVID-19 hospital admissions, and inpatient and ICU bed capacity occupancy.
Reporting information:
Notes: June 1, 2023: Due to incomplete or missing hospital data received for the May 21, 2023, through May 27, 2023, reporting period, the COVID-19 Hospital Admissions Level could not be calculated for the Commonwealth of the Northern Mariana Islands (CNMI) and will be reported as “NA” or “Not Available” in the COVID-19 Hospital Admissions Level data released on June 1, 2023.
June 8, 2023: Due to incomplete or missing hospital data received for the May 28, 2023, through June 3, 2023, reporting period, the COVID-19 Hospital Admissions Level could not be calculated for CNMI and American Samoa (AS) and will be reported as “NA” or “Not Available” in the COVID-19 Hospital Admissions Level data released on June 8, 2023.
June 15, 2023: Due to incomplete or missing hospital data received for the June 4, 2023, through June 10, 2023, reporting period,
The United States have recently become the country with the most reported cases of 2019 Novel Coronavirus (COVID-19). This dataset contains daily updated number of reported cases & deaths in the US on the state and county level, as provided by the Johns Hopkins University. In addition, I provide matching demographic information for US counties.
The dataset consists of two main csv files: covid_us_county.csv
and us_county.csv
. See the column descriptions below for more detailed information. In addition, I've added US county shape files for geospatial plots: us_county.shp/dbf/prj/shx.
covid_us_county.csv
: COVID-19 cases and deaths which will be updated daily. The data is provided by the Johns Hopkins University through their excellent github repo. I combined the separate "confirmed cases" and "deaths" files into a single table, removed a few (I think to be) redundant geo identifier columns, and reshaped the data into long format with a single date
column. The earliest recorded cases are from 2020-01-22.
us_counties.csv
: Demographic information on the US county level based on the (most recent) 2014-18 release of the Amercian Community Survey. Derived via the great tidycensus package.
COVID-19 dataset covid_us_county.csv
:
fips
: County code in numeric format (i.e. no leading zeros). A small number of cases have NA values here, but can still be used for state-wise aggregation. Currently, this only affect the states of Massachusetts and Missouri.
county
: Name of the US county. This is NA for the (aggregated counts of the) territories of American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and Virgin Islands.
state
: Name of US state or territory.
state_code
: Two letter abbreviation of US state (e.g. "CA" for "California"). This feature has NA values for the territories listed above.
lat
and long
: coordinates of the county or territory.
date
: Reporting date.
cases
& deaths
: Cumulative numbers for cases & deaths.
Demographic dataset us_counties.csv
:
fips
, county
, state
, state_code
: same as above. The county names are slightly different, but mostly the difference is that this dataset has the word "County" added. I recommend to join on fips
.
male
& female
: Population numbers for male and female.
population
: Total population for the county. Provided as convenience feature; is always the sum of male + female
.
female_percentage
: Another convenience feature: female / population
in percent.
median_age
: Overall median age for the county.
Data provided for educational and academic research purposes by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).
The github repo states that:
This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.