36 datasets found

d
Statistics review 2: Samples and populations
catalog.data.gov
data.virginia.gov
+1more
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Statistics review 2: Samples and populations [Dataset]. https://catalog.data.gov/dataset/statistics-review-2-samples-and-populations
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
National Institutes of Health
Description
The previous review in this series introduced the notion of data description and outlined some of the more common summary measures used to describe a dataset. However, a dataset is typically only of interest for the information it provides regarding the population from which it was drawn. The present review focuses on estimation of population values from a sample.
n
Data from: Assessing cetacean populations using integrated population...
data.niaid.nih.gov
datadryad.org
zip
Updated Mar 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eiren Jacobson; Charlotte Boyd; Tamara McGuire; Kim Shelden; Gina Himes Boor; André Punt (2020). Assessing cetacean populations using integrated population models: an example with Cook Inlet beluga whales [Dataset]. http://doi.org/10.5061/dryad.9zw3r229w
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.9zw3r229w
Dataset updated
Mar 13, 2020
Dataset provided by
Montana State University
University of Washington
National Oceanic and Atmospheric Administration
Cook Inlet Beluga Whale Photo ID Project-Alaska WildLife Alliance*
University of St Andrews
Authors
Eiren Jacobson; Charlotte Boyd; Tamara McGuire; Kim Shelden; Gina Himes Boor; André Punt
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Cook Inlet
Description
Effective conservation and management of animal populations requires knowledge of abundance and trends. For many species, these quantities are estimated using systematic visual surveys. Additional individual-level data are available for some species. Integrated population modelling (IPM) offers a mechanism for leveraging these datasets into a single estimation framework. IPMs that incorporate both population- and individual-level data have previously been developed for birds, but have rarely been applied to cetaceans. Here, we explore how IPMs can be used to improve the assessment of cetacean populations. We combined three types of data that are typically available for cetaceans of conservation concern: population-level visual survey data, individual-level capture-recapture data, and data on anthropogenic mortality. We used this IPM to estimate the population dynamics of the Cook Inlet population of beluga whales (CIBW; Delphinapterus leucas) as a case study. Our state-space IPM included a population process model and three observational submodels: 1) a group detection model to describe group size estimates from aerial survey data; 2) a capture-recapture model to describe individual photographic capture-recapture data; and 3) a Poisson regression model to describe historical hunting data. The IPM produces biologically plausible estimates of population trajectories consistent with all three datasets. The estimated population growth rate since 2000 is less than expected for a recovering population. The estimated juvenile/adult survival rate is also low compared to other cetacean populations, indicating that low survival may be impeding recovery. This work demonstrates the value of integrating various data sources to assess cetacean populations and serves as an example of how multiple, imperfect datasets can be combined to improve our understanding of a population of interest. The model framework is applicable to other cetacean populations and to other taxa for which similar data types are available.

Methods /Data/CIBW_RSideCapHist_McGuire&Stephens.csv contains a matrix of right side capture histories (1 = captured, 0 = not captured) for each individual (rows) and year (columns). Photographic capture-recapture data were collected by Tamara McGuire. These data are made available here, without restriction, but anyone wishing to use these data is requested to contact tamaracookinletbeluga@gmail.com, who can provide further information on how raw data were processed to provide capture histories.

/Data/CIBW_HuntData_Mahoney&Shelden2000.xlsx contains the minimum documented number of animals killed (MinKilled) for years between 1950 and 1998 as published in Mahoney and Shelden 2000. Entries which are NA indicate that no data were available for that year.

/Data/CIBW_Abundance_HobbsEtAl2015.xlsx contains the total group size estimates from Hobbs et al. 2015.

/Data/CIBW_Abundance_BoydEtAl2019.txt contains an array with dimensions [1:1000, 1:8, 1:11] containing 1000 posterior samples of total group size for up to 8 survey days over 11 years, as described in Boyd et al. 2019.
e
Respondent-Driven Sampling and Total Population Data from a Rural Ugandan...
b2find.eudat.eu
Updated Nov 9, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Respondent-Driven Sampling and Total Population Data from a Rural Ugandan Cohort, 2010: Special Licence Access - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/a5bd5cb6-6712-5850-97b4-7f0c7b7b9281
Explore at:
Dataset updated
Nov 9, 2011
Area covered
Uganda
Description
Abstract copyright UK Data Service and data collection copyright owner. This is a mixed-methods data collection. This study used Respondent Driven Sampling (RDS) methodology, which is a sampling method designed to generate unbiased estimates of population characteristics for populations where a sampling frame is not available. It is a form of snowball or link-tracing sampling, where respondents are given coupons to recruit other members of the target population, and where respondents are rewarded for both participating and for recruiting others. In addition to variables of interest, data are collected on the number of members of the target population each participant knows. Estimation methods are then applied to account for the non-random sample selection in an attempt to generate unbiased estimates for the target population. In 2010, the researchers conducted an RDS study in a rural Ugandan population where total population data were available. The aim of this study was to evaluate whether RDS could generate representative data on a rural Ugandan population by comparing estimates from an RDS survey with total-population data. The data used to define the target population (male household heads) were available from an ongoing general population cohort of 25 villages in rural Masaka, Uganda covering an area of approximately 38km. Annually, households in the study villages are mapped and after obtaining consent, a total-population household census and an individual questionnaire are administered and blood taken for HIV-1 testing. A random sample of eligible men in the target population who were not recruited during the RDS study were also interviewed, using the same RDS questionnaire. Finally, 49 qualitative interviews (of which summaries have been deposited) were conducted with a range of people (men and women) including RDS participants and non-participants, and RDS interviewers. These data can be used to evaluate the RDS sampling method, and to test new RDS estimators. Further information may be found in the documentation and in the journal articles listed in the Publications section. Special Licence access and geographic data This data collection is subject to Special Licence access conditions (see Access section for details). Data are analysable at individual village level, and GPS point data are available for the villages and interview sites. Finer detail geographic variables may be available for certain research questions. If these are required, users should request this when making their Special Licence application. Main Topics: Quantitative data: demographic characteristics of the individual, including household composition, age, HIV status, tribe, religion, relationship between target population sample member and contacts, geographic data. Qualitative interview summaries: respondents' opinions of the study, the conduct of the research and the incentives used. Respondent Driven Sampling methods were used - see Abstract and documentation for details.
d
NYSERDA Low- to Moderate-Income New York State Census Population Analysis...
catalog.data.gov
datasets.ai
+3more
Updated Jun 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2025). NYSERDA Low- to Moderate-Income New York State Census Population Analysis Dataset: Average for 2013-2015 [Dataset]. https://catalog.data.gov/dataset/nyserda-low-to-moderate-income-new-york-state-census-population-analysis-dataset-aver-2013
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
data.ny.gov
Area covered
New York
Description
How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov. The Low- to Moderate-Income (LMI) New York State (NYS) Census Population Analysis dataset is resultant from the LMI market database designed by APPRISE as part of the NYSERDA LMI Market Characterization Study (https://www.nyserda.ny.gov/lmi-tool). All data are derived from the U.S. Census Bureau’s American Community Survey (ACS) 1-year Public Use Microdata Sample (PUMS) files for 2013, 2014, and 2015. Each row in the LMI dataset is an individual record for a household that responded to the survey and each column is a variable of interest for analyzing the low- to moderate-income population. The LMI dataset includes: county/county group, households with elderly, households with children, economic development region, income groups, percent of poverty level, low- to moderate-income groups, household type, non-elderly disabled indicator, race/ethnicity, linguistic isolation, housing unit type, owner-renter status, main heating fuel type, home energy payment method, housing vintage, LMI study region, LMI population segment, mortgage indicator, time in home, head of household education level, head of household age, and household weight. The LMI NYS Census Population Analysis dataset is intended for users who want to explore the underlying data that supports the LMI Analysis Tool. The majority of those interested in LMI statistics and generating custom charts should use the interactive LMI Analysis Tool at https://www.nyserda.ny.gov/lmi-tool. This underlying LMI dataset is intended for users with experience working with survey data files and producing weighted survey estimates using statistical software packages (such as SAS, SPSS, or Stata).
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/cd10009e-a79f-4de5-a12c-87bb5b499e9f/metadata/FGDC-STD-001-1998.html
Explore at:
json(5), gml(5), xls(5), geojson(5), kml(5), zip(1), csv(5), shp(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2017
Area covered
New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
Description
A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey population data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of total population, male and female, and both broad and narrowly-defined age groups. In addition to the standard selection of age-group breakdowns (by male or female), the dataset provides supplemental calculated fields which combine several attributes into one (for example, the total population of persons under 18, or the number of females over 65 years of age). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.
w
National Population Database
data.wu.ac.at
gimi9.com
wms
Updated Apr 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health and Safety Laboratory (2018). National Population Database [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/NzJkOGJmNjMtN2NjMi00OGI2LThkOTctYTg1ZDQ4MmJmMjlj
Explore at:
wmsAvailable download formats
Dataset updated
Apr 20, 2018
Dataset provided by
Health and Safety Laboratory
Area covered
707bd9bad8997440d5674b70bc61d21f4a31c9b2
Description
The National Population Database (NPD) is a point-based Geographical Information System (GIS) dataset that combines locational information from providers like the Ordnance Survey with population information about those locations, mainly sourced from Government statistics. The points (and sometimes polygons) represent individual buildings, so the NPD allows detailed local analysis for anywhere in Great Britain.

The Health & Safety Laboratory (HSL) working with Staffordshire University originally created the NPD in 2004 to help its parent organisation, the Health and Safety Executive (HSE), assess the risks to society of major hazard sites e.g. oil refineries, chemical works and gas holders. Of particular interest to HSE were 'sensitive' populations e.g. schools and hospitals where the people at those locations may be more vulnerable to harm and potentially harder to evacuate in an emergency. The data is split into 5 themes: residential, sensitive populations, transport, workplaces and leisure.

More information about the NPD can be found here:

https://www.hsl.gov.uk/what-we-do/better-decisions/geoanalytics/national-population-database

The NPD was created using various datasets available within Government as part of the Public Sector Mapping Agreement (PSMA) and contains other intellectual property so is only available under license and for a fee. Please contact the HSL GIS Team if you would like to discuss gaining access to the sample or full dataset.
e
Population 24/7 Near Real Time: Data Library, Sample Outputs and Batch Files...
b2find.eudat.eu
Updated Apr 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Population 24/7 Near Real Time: Data Library, Sample Outputs and Batch Files for England, 2011 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/03b17c24-1611-558a-a6a8-7c7878f40139
Explore at:
Dataset updated
Apr 29, 2023
Area covered
England
Description
This data collection comprises a data library, sample outputs, batch files and accompanying documentation from the ESRC-funded project “Population247NRT: Near real-time spatiotemporal population estimates for health, emergency response and national security”. The data comprise a structured set of input data for use with the authors’ SurfaceBuilder247 software and sample outputs which estimate the population distribution of England at specific times on specific dates, referenced to 2011 census population totals. The sample output files (provided as GeoTIFFs) contain population estimates in 200m grid cells, based on the British National Grid, for 02:00 (2am) and 14:00 (2pm) on a typical weekday in University and school term-time and out of term-time. The estimates are broken down by seven age/economic activity sub-groups for term-time and six for out of term-time, and include estimates of population activity in residential, workplace, education, healthcare and road transportation domains. The data library, which has been constructed entirely using open data sources, comprises population estimates, by age/economic activity sub-groups, for point locations (typically population-weighted centroids of census output areas and workplace zones, or postcode centroids of sites such as schools or hospitals); time profiles representing usual patterns of population activity at these sites during a 24-hour period; and background grid layers representing the land surface area and major road network. SurfaceBuilder247 uses the data library to generate time-specific gridded population estimates by redistributing the population of each sub-group across the available locations and background grid in accordance with the reference time profiles. The sample output grids provided in this resource may be used directly in GIS software or, alternatively, the input data library may be reprocessed using SurfaceBuilder247 to generate estimates for specific dates and times of interest to the user. Sample batch and session parameter files are included in the resource.Decision-making and policy formulation in sectors such as health, emergency/crisis response and national security, ideally require accurate dynamic information on the number of people in specific places at specific times of the day, week, season or year. Traditional census data do not provide this level of detail but are often used for such policy and planning purposes. The ESRC-funded Population247 programme of research (Martin et al, 2015) developed a framework, methodology and software tool (SurfaceBuilder247) for integrating diverse contemporary data sources to produce enhanced time-specific population estimates for small geographical areas. Its usefulness has since been demonstrated for flooding and radiation emergency response/planning, through collaborations with HR Wallingford and Public Health England. These models have primarily involved the integration of open administrative data for activities such as place of residence, work, education and health. Now, new and emerging forms of data, such as sensor data, live and static data feeds provided via the internet, and various commercial datasets which were not previously available, provide exciting opportunities to enhance these population estimates. Such new and emerging datasets are useful because they provide near real-time information on population activity in sectors which are particularly dynamic and have previously been difficult to model, such as retail, leisure and transport. However, extracting useful intelligence from these sources, and integrating and calibrating them with existing data sources, poses significant challenges for researchers and practitioners seeking to employ them in the creation of time-specific population estimates. This project will combine new, emerging and existing datasets in order to produce enhanced time-specific population estimates for more informed decision-making and policy formulation in the health, emergency/crisis response and national security sectors. It is a collaborative project between University of Southampton, Public Health England (PHE), Health and Safety Executive (HSE) and Defence Science and Technology Laboratory (Dstl). The project will enhance existing methods and tools for harvesting, processing, integrating and calibrating new, emerging and existing data sources in order to produce time-specific population estimates. It will deliver two substantive policy demonstrator case studies with the project partners. The first case study will demonstrate the potential for using time-specific population estimates for near real-time response in emergencies; the second will explore their usefulness for modelling variation in 'normal' population distributions through space and time in order to inform longer-term planning and policy formulation. Importantly, the project will also encourage the sharing of knowledge and expertise between academia and the public sector through joint design and implementation of the case studies, internal seminars and a jointly organised stakeholder workshop. Invitees to the workshop will be key stakeholders in policy and practice from within and beyond the partners' sectors. The workshop will showcase the data, methods and tools developed by the project, discuss the opportunities and challenges involved in implementing these for decision-making and policy formulation, and identify how such methods might realistically be scaled up within these sectors. Ultimately, the aim of the project is to help partners such as PHE, HSE and Dstl carry out their remits more effectively and efficiently through the provision of better time-specific population estimates. The data library and sample output files provided in this data collection have been generated by processing a range of open data sources including residential and workplace populations from the 2011 Census, school and college pupil numbers from the school census and services such as the government’s ‘Get Information About Schools’, university student numbers from the Higher Education Statistics Agency, hospital patient numbers and attendance time profiles from NHS Digital, road traffic estimates from the Department for Transport National Transportation Model, and GIS road network, inland water and coastline layers from Ordnance Survey and the Office for National Statistics. Information from the 2015 Time Use Survey has been used in the estimation of typical time profiles for workplace activities. GIS processing has been undertaken to estimate typical catchment area sizes for locations such as schools and hospitals. The principal input data are population counts for 2011 census output areas in England, which determine the base populations of all the estimates produced. The project team have georeferenced, reformatted and integrated all the input sources to create an input data library for the SurfaceBuilder247 software. All the necessary input files are provided, together with sample outputs for selected times of interest.
w
National Population Database Northern Ireland
data.wu.ac.at
gimi9.com
+1more
html, wms
Updated Feb 10, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health and Safety Laboratory (2016). National Population Database Northern Ireland [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/NjkwMzExY2MtN2JkNi00OGY0LWJhZWQtNjQ3ZTBjYTAwNTU2
Explore at:
wms, htmlAvailable download formats
Dataset updated
Feb 10, 2016
Dataset provided by
Health and Safety Laboratory
Area covered
Northern Ireland, Ireland, 73b73ed52009d67d5ee05f17f205239917d6b717
Description
The National Population Database (NPD) for Northern Ireland is a point-based Geographical Information System (GIS) dataset that combines locational information from Ordnance Survey Northern Ireland (OSNI) with population information about those locations, mainly sourced from Northern Irish government statistics. The points represent individual buildings allowing the NPD NI to provide detailed local analysis for anywhere in Northern Ireland.

The Health and Safety Laboratory (HSL) working with Staffordshire University originally created the NPD for Great Britain in 2004 to help its parent organisation, the Health and Safety Executive (HSE), assess the risks to society of major hazard sites e.g. oil refineries, chemical works and gas holders. Of particular interest to HSE were ‘sensitive’ populations e.g. schools and hospitals where the people at those locations may be more vulnerable to harm and potentially harder to evacuate in an emergency. The data for the NPD NI includes residential, schools and colleges, hospitals and workplace layers.

The NPD NI was created using various datasets from OSNI and government organisations and contains other intellectual property so is only available under a license and for a fee. Please contact the HSL GIS team if you would like to discuss gaining access to the sample or full dataset.
Annual Population Survey Two-Year Longitudinal Dataset, January 2022 -...
beta.ukdataservice.ac.uk
Updated 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Survey Division Office For National Statistics (2024). Annual Population Survey Two-Year Longitudinal Dataset, January 2022 - December 2023 [Dataset]. http://doi.org/10.5255/ukda-sn-9274-2
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-9274-2
Dataset updated
2024
Dataset provided by
DataCitehttps://www.datacite.org/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Social Survey Division Office For National Statistics
Description
The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS), all its associated LFS boosts and the APS boost.
The APS allows for analysis to be carried out on detailed subgroups and below regional level. In recent years (particularly with the sample size of the LFS 5 quarter dataset reducing) there has been some interest in producing a two year APS longitudinal dataset to look at any trends that may occur over a year. The APS Two-Year Longitudinal Datasets, covering 2012/13 onwards, have been deposited as a result of this work. Person- and Household-level APS datasets are also available.

For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation.
Occupation data for 2021 and 2022
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022

Latest edition information
For the second edition (August 2024), a revised version of the data was deposited with additional education variables added, including HIQUAL221, HIQUAL222, HIQUL22D1, HIQUL22D2, LEVQUL221 and LEVQUL222.
e
Annual Population Survey Two-Year Longitudinal Dataset, January 2022 -...
b2find.eudat.eu
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Annual Population Survey Two-Year Longitudinal Dataset, January 2022 - December 2023 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c32753b1-f8b7-5f19-92fc-15583b95b5c2
Explore at:
Dataset updated
Dec 15, 2023
Description
Abstract copyright UK Data Service and data collection copyright owner.The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS), all its associated LFS boosts and the APS boost.The APS allows for analysis to be carried out on detailed subgroups and below regional level. In recent years (particularly with the sample size of the LFS 5 quarter dataset reducing) there has been some interest in producing a two year APS longitudinal dataset to look at any trends that may occur over a year. The APS Two-Year Longitudinal Datasets, covering 2012/13 onwards, have been deposited as a result of this work. Person- and Household-level APS datasets are also available. For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation.Occupation data for 2021 and 2022The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022
Datasets, models and demos associated to "Celldetective: an AI-enhanced...
zenodo.org
zip
Updated Apr 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rémy Torro; Rémy Torro; Beatriz Diaz-Bello; Dalia El Arawi; Dalia El Arawi; Lorna Ammer; Patrick Chames; Kheya Sengupta; Kheya Sengupta; Laurent Limozin; Laurent Limozin; Beatriz Diaz-Bello; Lorna Ammer; Patrick Chames (2024). Datasets, models and demos associated to "Celldetective: an AI-enhanced image analysis tool for unraveling dynamic cell interactions" [Dataset]. http://doi.org/10.5281/zenodo.10650279
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10650279
Dataset updated
Apr 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rémy Torro; Rémy Torro; Beatriz Diaz-Bello; Dalia El Arawi; Dalia El Arawi; Lorna Ammer; Patrick Chames; Kheya Sengupta; Kheya Sengupta; Laurent Limozin; Laurent Limozin; Beatriz Diaz-Bello; Lorna Ammer; Patrick Chames
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Overview

This repository contains datasets, models and demos associated to Celldetective, a software for single-cell analysis from multimodal time lapse microscopy images.

Demos

Cell-cell interaction assay: ADCC

We imaged a co-culture of MCF-7 breast cancer cells (targets) and human primary NK cells (effectors), interacting in the presence of bispecific antibodies, to measure antibody dependent cellular cytotoxicity (ADCC). The nuclei of all cells are marked with the Hoechst nuclear stain, the dead nuclei with the propidium iodide nuclear stain, the cytoplasm of the NK cells with CFSE. The system in epifluorescence and brightfield at either 20 or 40X magnification. We provide a single position demo for the ADCC assay, as "demo_adcc.zip". After unzipping, the demo_adcc folder can be loaded in Celldetective for testing.

Cell-surface interaction assay: RICM

We imaged human primary NK cells engaging in spreading with a surface coated with a bispecific antibody similar to the one used in the ADCC assay (replacing the target cells with a flat surface). The system is imaged using the RICM technique. Images are normalized using a median estimate of the background, pooled from all the positions in a well and dividing the images by this estimate. Here, we provide a single position demo for the cell-surface interactiona assay imaged in RICM, as "demo_ricm.zip". As above, after unzipping, the experiment can be tested and processed in Celldetective.

Datasets

Image annotations for segmentation

Cell-cell interaction assay: ADCC

We generated two sets of annotations from images of a co-culture of MCF-7 breast cancer cells and human primary NK cells, interacting in the presence of bispecific antibodies, to measure antibody dependent cellular cytotoxicity (ADCC). Since there are two separate cell populations of interest, the targets (MCF-7) and effectors (NK cells), we curated two datasets. Each sample in a dataset consists of a multichannel image (up to five channels in the context of ADCC, among brightfield , Hoechst nuclear stain, PI nuclear stain, CFSE, LAMP1), the associated instance segmentation annotation for the population of interest and a json file summarizing the content of each channel and the spatial calibration of the image. These sample data are generated directly in Celldetective, using a custom napari plugin.

db_mcf7_nuclei_w_primary_NK: MCF-7 cell nuclei are annotated specifically on images where primary NK cells, and RBCs co-exist. The annotation exploits up to four channels simultaneously.

db_primary_NK_w_mcf7: human primary NK cells, with annotated cytoplasm (mostly from CFSE) but exploiting brightfield and Hoechst to segment out of focus or poorly labelled cells.

These datasets are used to train several segmentation models to segment on one hand the MCF-7 nuclei and on the other hand the primary NK cells.

Single-cell signal annotations for classification and regression

Cell-cell interaction assay: ADCC

We generated several signal classification/regression datasets with Celldetective to characterize the ADCC assay. Briefly, for a given event cells can be classified as "the event occured during the observation", "no event occured during the observation", "the event already occured prior to observation". If the event occurred during the observation, we can estimate when (the regression). Each single-cell is a dictionary with a collection of signals. The attribute "class" sets the class and "t0" the time of event (default is -1 for absence of event).

db-si-NucPI: classification and regression of single-cells with respect to lysis events characterized by a strong PI increase upon lysis (also associated with decreasing nuclear area and sometimes a decreasing Hoechst)

db-si-NucCondensation: classification and regression of single-cells with respect to nucleus shrinking events characterized by a decreasing nuclear area

Models

Segmentation models

Generalist models

We integrated in Celldetective select published models for cellular segmentation from StarDist and Cellpose. We wraped the models with an input configuration to help Celldetective handle the normalization, rescaling and channel selection upon inference.

Cellpose [1,2]: cellpose, cyto, cyto2, livecell, tissuenet, nuclei

StarDist [3]: paper_dsb2018, versatile_fluo, versatile_he

If you use any of these models your research, don't forget to cite the StarDist or Cellpose papers accordingly!

ADCC models

MCF-7 (in the presence of NKs): MCF7_bf_pi_cfse_h, MCF7_bf_h_pi, MCF7_h_pi, MCF7_h_versatile

NKs (in the presence of MCF-7): primNK_multimodal, primNK_SD, primNK_cfse

Signal analysis models

We developed Deep Learning models that classify and regress the time of events from single-cell signals, applied to the ADCC assay.

lysis detection: lysis_H_PI, lysis_PI_area,. Detect lysis events characterized at least by an increase of PI from one or more measurements (respectively PI+Hoechst and PI+nucleus area, trained on db-si-NucPI)

nucleus shrinking detection: NucCond. Detect nucleus shrinking events from nuclear area signal (db-si-NucCondensation)

References

Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods 18, 100–106 (2021).

Pachitariu, M. & Stringer, C. Cellpose 2.0: how to train your own model. Nat Methods 19, 1634–1641 (2022).

Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell Detection with Star-Convex Polygons. in Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 (eds. Frangi, A. F., Schnabel, J. A., Davatzikos, C., Alberola-López, C. & Fichtinger, G.) 265–273 (Springer International Publishing, Cham, 2018). doi:10.1007/978-3-030-00934-2_30.
Data from: Porpoise Observation Database (NRM)
gbif.org
researchdata.se
+2more
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linnea Cervin; Linnea Cervin (2024). Porpoise Observation Database (NRM) [Dataset]. http://doi.org/10.15468/yrxfxp
Explore at:
Unique identifier
https://doi.org/10.15468/yrxfxp
Dataset updated
Dec 18, 2024
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Swedish Museum of Natural History
Authors
Linnea Cervin; Linnea Cervin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This data set contains observations of dead or alive harbor porpoises made by the public, mostly around the Swedish coast. A few observations are from Norwegian, Danish, Finish and German waters. Each observation of harbor porpoise is verified at the Swedish Museum of Natural History before it is approved and published on the web. The verification consists of controlling the accuracy of number of animals sighted, if the coordinates are correct and if pictures are attached that they really show a porpoise and not another species. If any of these three seem unlikely, the reporter is contacted and asked more detailed questions. The report is approved or denied depending on the answers given. Pictures and movies that can’t be uploaded to the database due to size problems are saved at the museum server and marked with the identification number given by the database. By the end of the year the data is submitted to HELCOM who then summarize all the member state’s data from the Baltic proper to the Kattegat basin. The porpoise is one of the smallest tooth whales in the world and the only whale species that breeds in Swedish waters. They are to be found in temperate water in the northern hemisphere where they live in small groups of 1-3 individuals. The females give birth to a calf in the summer months which then suckles for about 10 months before it is left on its own and she has a new calf. The porpoises around Sweden are divided in to three groups that don’t mix very often. The North Sea population is found on the west coast in Skagerrak down to the Falkenberg area. The Belt Sea population is to be found a bit north of Falkenberg down to Blekinge archipelago in the Baltic. The Baltic proper population is the smallest population and consists only of a few hundred animals and is considered as an endangered sub species. They are most commonly found from the Blekinge archipelago up to Åland Sea with a hot spot area south of Gotland at Hoburg’s bank and the Mid-Sea bank. The Porpoise Observation Database was started in 2005 at the request of the Swedish Environmental Protection Agency to get a better understanding of where to find porpoises with the idea to use the public to expand the “survey area”. The first year 26 sightings were reported, where 4 was from the Baltic Sea. The museum is particularly interested in sightings from the Baltic Sea due to the low numbers of animals and lack of data and knowledge about this group. In the beginning only live sightings were reported but later also found dead animals were added. Some of the animals that are reported dead are collected. Depending on where it is found and its state of decay, the animal can be subsampled in the field. A piece of blubber and some teeth are then send in by mail and stored in the Environmental Specimen Bank at the Swedish Museum of Natural History in Stockholm. If the whole animal is collected an autopsy is performed at the National Veterinary Institute in Uppsala to try and determine cause of death. Organs, teeth and parasites are sampled and saved at the Environmental Specimen Bank as well. Information about the animal i.e. location, founding date, sex, age, length, weight, blubber thickness as well as type of organ and the amount that is sampled is then added to the Specimen Bank database. If there is an interest in getting samples or data from the Specimen Bank, one have to send in an application to the Department of Environmental research and monitoring and state the purpose of the study and the amount of samples needed.
p
National Sustainable Development Plan Baseline Survey 2019, Household Income...
microdata.pacificdata.org
Updated Oct 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanuatu National Statistics Office (2020). National Sustainable Development Plan Baseline Survey 2019, Household Income and Expenditure Survey 2019 - Vanuatu [Dataset]. https://microdata.pacificdata.org/index.php/catalog/742
Explore at:
Dataset updated
Oct 9, 2020
Dataset authored and provided by
Vanuatu National Statistics Office
Time period covered
2019 - 2020
Area covered
Vanuatu
Description
Abstract

The National Sustainable Development Plan (NSDP) Baseline Survey 2019 is an expanded Household Income and Expenditure Survey (HIES) and is inclusive of health educational, cultural, and productive dimensions previously uncollected or in need of updating. The results of this survey will inform directly more than 30 key indicators listed in the NSDP M&E (Monitoring and Evaluation) Framework, as well as more than 40 of the listed indicators for the United Nations Sustainable Development Goals (SDGs). The NSDP Baseline Survey presents an opportunity as well for Vanuatu to establish a comprehensive Melanesian Wellbeing baseline as well as an updated baseline for the calculation of the Consumer Price Index (CPI) and revising National Accounts.

Geographic coverage

National coverage. Below are the details of this national coverage: 1. National (Vanuatu); 2. Provinces (Torba, Sanma, Penama, Malampa, Shefa, Tafea); 4. Area Councils (Torres Area council right to Futuna & Aneityum Area Council); 5. Villages / Towns; 6. Urban/Rural.

Analysis unit

Household and Individual.

Universe

All de jure residents.

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample size for this survey was determined using the previous 2010 Household Income and Expenditure Survey (HIES) outputs, and especially the per capita monthly total expenditure. From the 2010 HIES the mean, standard deviation and standard error were computed (per capita expenditure) and from the 2016 Census the distribution of the population across the 6 provinces of Vanuatu was used as a base. According to the accuracy of this variable of interest within each province the sample size per province were adjusted in order to get an expected sampling error around 5% within each province. The sampling frame used is the last 2016 Vanuatu census for the computation of the probability of selection of the Enumeration Areas (EAs) and the random selection method started with the random selection of EAs using the probability proportional to size. Then within each selected EAs 10 households were randomly selected using the sampling uniformed method. Within each selected EA the household listing were updated by the team before random selection and interview.

i) The only variable considered is per capita total household expenditure (variable of interest), as in addition to being one of the main indicators derived from the Household Income and Expenditure Survey (HIES), it is likely highly correlated with many other variables of interest (e.g. poverty). From the 2010 HIES dataset, using this variable of interest, a list of relevant indicators were calculated, those indicators provide information on: - (a)the status of the household expenditure distribution within each province, - (b) The efficiency provided by the 2010 HIES sample design - (c) The accuracy of the estimates calculated from the 2010 HIES dataset (especially the per capita household expenditure, our variable or interest)

ii) The original dataset has been trimmed using the variable of interest, the lowest and the highest percentiles (the 1% households with the lowest and highest per capita total household expenditure) were removed from the analysis (outliers). The dataset ends up with 4,289 households (given 4,377 households were completed).

iii) The 2010 Vanuatu HIES sample was based on a stratified multi stages selection - Stratification: geographical provinces (by urban / rural locations) - First stage of selection: Enumerations Areas (EAs) with probability of selection proportional to size - Second stage: households, with uniform probability of selection within the EAs

iv) The mean and standard deviation indicate the status of the variable of interest within each strata. The intracluster correlation (p), and the design effect (DEFF) highlight the efficiency of the sampling strategy, and the standard error/relative standard error (SE/RSE) of the variable of interest show its accuracy.

v) The purpose of this analysis is to get some insights from the 2010 HIES sample design in order to improve the 2019 survey. There is no point to improve the sample size in strata where the sample is not efficient (the gain in accuracy will be minor compared to the related cost).

vi) The challenge in the 2019 Vanuatu baseline survey: - Meet precision targets in each strata (provincial level) including Penama where Ambae island has been evacuated at the time of the sample design. - Acceptable sample size (due to budget constraints) - Following international recommendations (12 months of field operation) - Enhance the monitoring and supervision of the field staff and simplify management of the logistics in the field

==> Optimize the variance/cost ratio of the survey design vii) Table 1 from the Document Sample Design (provided as External Resources) presents the Vanuatu 2010 HIES survey specifications, efficiency and accuracy in each strata (for the variable of interest). It shows that some improvements can be done in Torba, and Shefa rural (where the RSE is higher than 5%), and it shows a high intraclass correlation in Malampa, Shefa rural and Tafea (that lead to a high design effect in those strata). In Torba, the high design effect comes from the high number of households interviewed in each selected EA (on average 33 households per selected EA in this strata were interviewed). - Torba: the sample size is good, there is just a need to reduce the number of households to interview within each strata (and in order to keep a similar sample size the number of EAs to select in the province will be increased) - Malampa: given the high intracluster correlation in this province, a higher number of EAs to select is required (with the same number of households per EA to interview). - Shefa rural: keep the same number of households to interview within each EA, and increase the number of EA to select (this will lead to a higher sample size) - Tafea: similar to Malampa province, the high intraclass correlation indicates that the number of EAs to select has to be increased (therefore the sample size as well). The sample size has to be increased in Malampa, Shefa rural and Tafea, for the rest, the 2019 design will have to be similar as 2010 (in order to provide at least the same level of accuracy). viii) The 2019 Vanuatu base line survey follows the international recommendations in terms of data collection schedule (12-month coverage) and considers a better management and supervision of the field staff. In this context, the field staff will work by team, given that: - A team is made of 1 supervisor (team leader) and 2 or 3 interviewers - Each interviewer will be responsible for 5 interview per round - A round of survey is a 1 week period - 1 EA is covered during 1 round, after the round completion, the team moves to the next EA for the next round. - A team complete 32 rounds during the 12 month field operation period (roughly every 2 rounds/2 weeks) of work is followed by 1 round/1 week of rest). ix) Table 3 from the Document Sample Design (provided as External Resources) presents a survey schedule starting February 2019 and ending February 2020. During this period of 32 working weeks (corresponding to 32 different selected EAs) the teams will be on the field (a 3 weeks period of rest during Christmas period).

x) The number of interviewer by team and number of team by province will determine the total sample size within each province. A team made of 3 interviewers can achieve 480 households over the period, while a team of 2 interviewers can achieve only 320 cases.

xi) The intraclass correlation is used to calculate the precision loss due to clustering. Like the standard deviation, the intracluster correlation is considered to be a true population parameter, and therefore transferable between designs. We have to accept the hypothesis that this correlation factor has not changed during the period 2010-2019, and therefore can be used to predict DEFF and RSE for the next survey given an adjusted design (based on the conclusions provided by the 2010 design). Table 2 from the Document Sample Design (provided as External Resources) predicts the design effect and sampling error of the variable of interest given the new sample design that is based on: - the sample size within each strata - the number of teams within each strata - the number of interviewers per team In order to allow more flexibility in the sample size, it is preferable to set up some teams of 3 interviewers, that can achieve 480 households, which represent a good sample size for Torba and Sanma urban and some teams of 2 interviewers that will achieve 320 households each (2 teams will be required in other provinces).

xii) The proposed design in Table 2 from the Document Sample Design (provided as External Resources) shows a total sample size of 4,640 households and a higher level of accuracy of the estimate of the variable of interest in all the stratas. Only Shefa rural shows a RSE higher than 5%, which will be still acceptable. The high intraclass correlation in Shefa rural impacts the variance of the estimates and lead to an increase the sample size or a decrease of the number of households to interview per EA which is logistically and financially not recommended.

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

The questionnaire was developed in English using the World Bank software Survey Solutions. This questionnaire is divided into 18 modules that are detailed below.

-Introduction (geographic areas, list of household members) -Module 1: Demographic characteristics: ethnicity, marital status; -Module 2: Wellbeing: culture
VA Personal Health Record Sample Data
catalog.data.gov
datahub.va.gov
+3more
Updated Aug 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2025). VA Personal Health Record Sample Data [Dataset]. https://catalog.data.gov/dataset/va-personal-health-record-sample-data
Explore at:
Dataset updated
Aug 2, 2025
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
My HealtheVet (www.myhealth.va.gov) is a Personal Health Record portal designed to improve the delivery of health care services to Veterans, to promote health and wellness, and to engage Veterans as more active participants in their health care. The My HealtheVet portal enables Veterans to create and maintain a web-based PHR that provides access to patient health education information and resources, a comprehensive personal health journal, and electronic services such as online VA prescription refill requests and Secure Messaging. Veterans can visit the My HealtheVet website and self-register to create an account, although registration is not required to view the professionally-sponsored health education resources, including topics of special interest to the Veteran population. Once registered, Veterans can create a customized PHR that is accessible from any computer with Internet access.
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/75447ccc-1b99-4481-b1f7-b2e821c14864/metadata/FGDC-STD-001-1998.html
Explore at:
csv(5), geojson(5), zip(1), json(5), shp(5), xls(5), gml(5), kml(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2016
Area covered
New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
Description
A broad and generalized selection of 2012-2016 US Census Bureau 2016 5-year American Community Survey poverty data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of populations living below the poverty level, as grouped by age, sex, education, workforce status, and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
u
South African Social Giving Survey 2003 - South Africa
datafirst.uct.ac.za
Updated May 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Development Agency (NDA) (2020). South African Social Giving Survey 2003 - South Africa [Dataset]. https://www.datafirst.uct.ac.za/dataportal/index.php/catalog/329
Explore at:
Dataset updated
May 23, 2020
Dataset provided by
National Development Agency (NDA)
Centre for Civil Society (CCS)
Southern African Grantmakers’ Association (SAGA)
Time period covered
2003
Area covered
South Africa
Description
Abstract

The State of Giving project, established by the Centre for Civil Society (CCS) at the University of KwaZulu-Natal (UKZN), the Southern African Grantmakers’ Association (SAGA) and the National Development Agency (NDA), was initiated to generate information on and analyse the resource flows to poverty alleviation and development in South Africa. One component of the broader project was a focus on individual-level giving, which involved the design, implementation and analysis of a national sample survey on individual level giving behaviour. It thus speaks to both the urban and rural and the formal and informal dimensions of our social context. The survey collected data on who gives, why and how much they give, as well as what they give and the recipients of their giving.

Geographic coverage

The sample, a random stratified one comprising 3000 respondents, is representative of all South Africans aged 18 and above.

Analysis unit

Individuals

Universe

The population of interest in the survey was all South Africans aged 18 and above.

Kind of data

Sample survey data

Sampling procedure

A random stratified survey sample was drawn by Ross Jennings at S&T. The sample was stratified by race and province at the first level, and then by area (rural/urban/etc.) at the second level. The sample frame comprised 3000 respondents, yielding an error bar of 1.8%. The results are representative of all South Africans aged 18 and above, in all parts of the country, including formal and informal dwellings. Unlike many surveys, the project partners ensured that the rural component of the sample (commonly the most expensive for logistical reasons) was large and did not require heavy weighting (where a small number of respondents have to represent the views of a far larger community).

Randomness was built into the selection of starting points (from which fieldworkers begin their work) - every 5th dwelling was selected, after a randomly selected starting point had been identified - and into the selection of respondents, where the birthday rule was applied. That is, a household roster was completed, all those aged 18 and above were listed, and the householder whose birthday came next was identified as the respondent. Three call-backs were undertaken to interview the selected respondent; if s/he was unavailable, the household was substituted.

A second sample was drawn, specifically to boost the minority religious groups – namely Hindus, Jews and Muslims. They are separately analysed and reported as part of the broader project, since area sampling was used, disallowing us from incorporating them into the national survey dataset.

Mode of data collection

Face-to-face [f2f]

Research instrument

A set of focus groups were staged across the country in order to inform questionnaire design. Groups were recruited across a range of criteria, including demographic and religious differences, in order to ensure a wide range of views were canvassed. Direct input from focus group participants informed a series of robust design sessions with all the project partners, from which a draft questionnaire was designed. The questionnaire was piloted in two provinces, involving urban and rural respondents and covering all four race groups. The pilot included testing specific questions, and the overall methodological approach, namely our ability to quantify giving. After the pilot results had been assessed, the questionnaire was revised before going into field.

Sampling error estimates

"0" values in some variables Many of the variables have a "0" value in addition to the values for responses, e.g. variables with yes/no responses are coded "0" "1""2". There is no indication that the 0 represents "missing" (only Q75 specifies the use of "0" for none/nobody).

Variable Q9 (Question 9) Q8 lists the number of resident children under the age of 18. Q9 refers to this question with: "of these children aged below 16 living in your household". This should probably be "aged below 18", in line with Q8 The data only reflects children under 16, so the question should probably have been "of these children, how many below the age of 16 are (Q9A) children of the head of the household and (Q9B) children not born to the head of household, i.e. children born to others. It seems though, that Q8 and Q9 should match, with Q8 identifying children and Q9 identifying children of the household head. If specifying 16 rather than 18 in Q9 is an error, then this has been reflected in the data. This means that household members 17-18 years are listed, but the data does not record whether they are children of the household head.

Variable Q21 (Question 21) “What do you think is the most deserving cause that you support or would support if you could?” There are 14 values for Q21 (1-14).According to the report (Everatt, D. and G. Solanki. 2005. A Nation of givers: Social giving amongst South Africans) this and other open-ended questions were later categorised and given numeric codes. However, a codebook was not included with the documentation provided to DataFirst

Variable Q22 (Question 22) “Is there one cause or charity or organisation you would definitely NOT give money to?” There are 14 values for Q22 (1-14). Again, this requires a code list for explanation.

Variable Q29 (Question 29) Q28 deals with the giving of goods/food/clothes. Q29 provides a breakdown of these items, and Q28Q29L lists time/labour as one of these. It seems that Q29L is incorrectly listed as a sub-set of goods/food/clothes. Also, giving time to causes is dealt with extensively in Q30A-Q and Q31A-Q, so this variable seems out of place.

Variable Q39 (Question 36) This concerns the giving of food, goods, or other forms of help to beggars/street children/people asking for help, but the question text does not specifically mention these forms of help, so can be misleading.

Variable Q44 (Question 44) Q44 asks the respondent to complete the sentence "Help the poor because…." There are 8 values for this variable (0-7 and 11). Again, a code list is required to explain these values.

Variable Q59 (Question 59) This question has three coded responses (1-3) so should have three values (or 4, with a “missing” value). There are 12 values for this variable, though (59A-59L). It is possible that this variable has been swopped with Q60 (However, Q60 only has 11 options in the questionnaire)

Variable Q60 (Question 60) The variable from this question only has 4 values, but there are 11 possible responses to this question (60A-60K). This variable could have been swopped with Q59 (In which case, the extra value needs explanation, as Q59 only has 11 options in the questionnaire.

Variables Q67 - Q82 From this point on the order of variables seems wrong, as the responses don't match the number of values listed in the questionnaire. The variables seem to refer to the next question along, e.g. Variable Q67 seems to have data emanating from Question 68, and so on. The data in the revised dataset has been corrected to reflect this.

There is no variable Q83 in the dataset, although there is a question 83 in the questionnaire. This seems to support the above explanation. Data users are requested to provide any additional findings on this that come to light in their research.
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/b815471c-fad7-4f65-b3c1-f1fbb6fa36ec/metadata/FGDC-STD-001-1998.html
Explore at:
shp(5), zip(1), geojson(5), kml(5), gml(5), json(5), csv(5), xls(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2016
Area covered
New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
Description
A broad and generalized selection of 2012-2016 US Census Bureau 2016 5-year American Community Survey education data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of educational attaiment by grade level and sex (for all persons 25 years and older), plus enrollment estimates at key educational levels (for the universe of all persons 3+ years old). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.
Data from: Enabling Lipidomic Biomarker Studies for Protected Populations by...
acs.figshare.com
xlsx
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madeline Isom; Eden P. Go; Heather Desaire (2024). Enabling Lipidomic Biomarker Studies for Protected Populations by Combining Noninvasive Fingerprint Sampling with MS Analysis and Machine Learning [Dataset]. http://doi.org/10.1021/acs.jproteome.3c00368.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.3c00368.s001
Dataset updated
Jan 4, 2024
Dataset provided by
ACS Publications
Authors
Madeline Isom; Eden P. Go; Heather Desaire
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Triacylglycerols and wax esters are two lipid classes that have been linked to diseases, including autism, Alzheimer’s disease, dementia, cardiovascular disease, dry eye disease, and diabetes, and thus are molecules worthy of biomarker exploration studies. Since triacylglycerols and wax esters make up the majority of skin-surface lipid secretions, a viable sampling method for these potential biomarkers would be that of groomed latent fingerprints. Currently, however, blood-based sampling protocols predominate in the field. The invasiveness of a blood draw limits its utility to protected populations, including children and the elderly. Herein we describe a noninvasive means for sample collection (from fingerprints) paired with fast MS data-acquisition (MassIVE data set MSV000092742) and efficient data analysis via machine learning. Using both supervised and unsupervised classification, we demonstrate the usefulness of this method in determining whether a variable of interest imparts measurable change within the lipidomic data set. As a proof-of-concept, we show that the method is capable of distinguishing between the fingerprints of different individuals as well as between anatomical sebum collection regions. This noninvasive, high-throughput approach enables future lipidomic biomarker researchers to more easily include underrepresented, protected populations, such as children and the elderly, thus moving the field closer to definitive disease diagnoses that apply to all.
d
Data from: Species occurrence data from the Range-Wide Bull Trout eDNA...
catalog.data.gov
datasetcatalog.nlm.nih.gov
+4more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Forest Service (2025). Species occurrence data from the Range-Wide Bull Trout eDNA Project [Dataset]. https://catalog.data.gov/dataset/species-occurrence-data-from-the-range-wide-bull-trout-edna-project-6332e
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
U.S. Forest Service
Description
These data include 2015 - 2018 eDNA field sample points indicating lab results for presence or absence of bull trout. Sample sites are spaced at a 1 kilometer interval throughout the historical range of bull trout. eDNA stream samples are collected and species presence/absence is determined by analyses at the National Genomics Center. Results are recorded in the feature attribute table of the eDNA sample site shapefile. One point feature in the shapefile was generated for each 1 kilometer sample point in the bull trout eDNA feature class. Where multiple samples were collected at a single eDNA sample site, replicate point features will occur at a single location in the shapefile. The bull trout is an ESA-listed species with a historical range that encompasses many waters across the Northwest. Though once abundant, bull trout have declined in many locations and are at risk from a changing climate, nonnative species, and habitat degradation. Informed conservation planning relies on sound and precise information about the distribution of bull trout in thousands of streams, but gathering this information is a daunting and expensive task. To overcome this problem, we coupled 1) predictions from the range-wide, spatially precise Climate Shield model on the location of natal habitats of bull trout with 2) a sampling template for every 8-digit hydrologic unit in the historical range of bull trout, based on the probability of detecting bull trout presence using environmental DNA (eDNA) sampling (McKelvey et al. 2016). The template consists of a master set of geospatially referenced sampling locations at 1-kilometer intervals within each cold-water habitat. We also identified sampling locations at this same interval based on the U.S. Fish and Wildlife Service's (USFWS) designation of critical spawning and rearing habitat. Based on field tests of eDNA detection probabilities conducted by the National Genomics Center for Wildlife and Fish Conservation, this sampling approach will reliably determine the presence of populations of bull trout, as well as provide insights on non-spawning habitats used by adult and subadult fish. The completed bull trout eDNA survey results are available through an interactive ArcGIS Online Map. The map provides the ability to zoom in and look at an area of interest, as well as to create queries or select an area to download points as a shapefile.
Expenditure and Consumption Survey, 1998 - West Bank and Gaza
dev.ihsn.org
datacatalog.ihsn.org
+1more
Updated Apr 25, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of Statistics (2019). Expenditure and Consumption Survey, 1998 - West Bank and Gaza [Dataset]. https://dev.ihsn.org/nada/catalog/73907
Explore at:
Dataset updated
Apr 25, 2019
Dataset authored and provided by
Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
Time period covered
1998
Area covered
Palestine, West Bank
Description
Abstract

The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.

The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.

Geographic coverage

The target population in the sample survey comprises all private household living in the West Bank and Gaza Srip, excluding nomads and students.

Analysis unit

1- Household/families. 2- Individuals.

Universe

The survey covered a national sample of households and all permanently residing individuals in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sample and Frame:

The target population in the survey sample comprises all households living in the West Bank and Gaza Strip, excluding nomads and students. The sample design is a stratified two-stage design for households selected to be interviewed. At the first stage a sample of cells (PSUs) was selected from PCBS master sample frame. At the second stage, a sample of households was selected after a complete household listing of the sampled cells.

Sample design:

Four levels of stratification have been made: 1. Stratification by District. 2. Stratification by place of residence, which comprises: (a) Municipalities (b) Villages (C) refugees camps 3. Stratification by locality size 4. Stratification by cell identification in that order

Target cluster size:

The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 10 households.

Sample size:

The total sample size collected, after excluding non-response and related losses, is 2851 households.

Detailed information/formulas on the sampling design are available in the user manual.

Sampling deviation

The standard errors for the main survey estimates were calculated to give the user an idea of their reliability or precision. Whereas, the variance was calculated using the method of ultimate clusters within any domain of estimation.

Detailed information on the sampling design deviation and calculation of the variance is available in the user manual.

Mode of data collection

Face-to-face [f2f]

Research instrument

The PECS questionnaire consists of two main sections:

First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:

Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.

Statement of the family members: Contains social, economic and demographic particulars of the selected family.

Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).

Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.

Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.

Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 707 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.

Cleaning operations

Raw Data

Harmonized Data

The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets.

The harmonization process starts with cleaning all raw data files received from the Statistical Office.

Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables.

A post-harmonization cleaning process is run on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institutes of Health (2025). Statistics review 2: Samples and populations [Dataset]. https://catalog.data.gov/dataset/statistics-review-2-samples-and-populations

Statistics review 2: Samples and populations

Explore at:

Dataset updated

Jul 24, 2025

Dataset provided by

National Institutes of Health

Description

The previous review in this series introduced the notion of data description and outlined some of the more common summary measures used to describe a dataset. However, a dataset is typically only of interest for the information it provides regarding the population from which it was drawn. The present review focuses on estimation of population values from a sample.

Clear search

Close search

Google apps

Main menu

Statistics review 2: Samples and populations

Data from: Assessing cetacean populations using integrated population...

Respondent-Driven Sampling and Total Population Data from a Rural Ugandan...

NYSERDA Low- to Moderate-Income New York State Census Population Analysis...

American Community Survey

National Population Database

Population 24/7 Near Real Time: Data Library, Sample Outputs and Batch Files...

National Population Database Northern Ireland

Annual Population Survey Two-Year Longitudinal Dataset, January 2022 -...

Annual Population Survey Two-Year Longitudinal Dataset, January 2022 -...

Datasets, models and demos associated to "Celldetective: an AI-enhanced...

Demos

Cell-cell interaction assay: ADCC

Cell-surface interaction assay: RICM

Datasets

Image annotations for segmentation

Cell-cell interaction assay: ADCC

Single-cell signal annotations for classification and regression

Cell-cell interaction assay: ADCC

Models

Segmentation models

Generalist models

ADCC models

Signal analysis models

References

Data from: Porpoise Observation Database (NRM)

National Sustainable Development Plan Baseline Survey 2019, Household Income...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

VA Personal Health Record Sample Data

American Community Survey

South African Social Giving Survey 2003 - South Africa

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Sampling error estimates

American Community Survey

Data from: Enabling Lipidomic Biomarker Studies for Protected Populations by...

Data from: Species occurrence data from the Range-Wide Bull Trout eDNA...

Expenditure and Consumption Survey, 1998 - West Bank and Gaza

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sample and Frame:

Sample design:

Target cluster size:

Sample size:

Sampling deviation

Mode of data collection

Research instrument

Cleaning operations

Raw Data

Harmonized Data

Statistics review 2: Samples and populations