The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.
The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).
The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.
A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.
Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.
The objective of the endline surveys in 2016 were to gather household, biomedical, and cognition data in order to evaluate the long-term impact of home supplementation with micronutrient powders (MNP), when combined with seasonal malaria chemoprevention (SMC) and early stimulation, delivered through community preschools and parenting sessions, on the health and cognitive development of children during the first five years of life.
The trial consisted of 3 arms. First, 60 villages with established Early Childhood Development centres (ECD) were randomised to 1 of 2 arms:
1) Children living in villages in the ECD control arm received SMC as part of national health programming and a national parenting intervention delivered by ECD center staff trained and supported by Save the Children, with ALL resident children eligible to participate in the interventions irrespective of enrolment in ECD program (ECD Control group).
2) Children living in villages in the intervention arm also received the SMC and parenting interventions described above, but additionally were eligible to receive home supplementation with micronutrient powders (MNP intervention arm).
3) Second, a third non-randomised arm was recruited comprised of children living in 30 randomly selected villages where there were no ECD centers in place and thus both the parenting interventions and MNPs were absent. These children received SMC only, as part of national health programming (non-ECD comparison arm).
Trial arm and Interventions received:
T1. MNP intervention arm: 30 villages with ECD centre (randomised); MNP-Yes, Parenting-Yes, SMC-Yes C1. ECD control arm: 30 villages with ECD centre (randomised); MNP-No, Parenting-Yes, SMC-Yes C2. Non-ECD comparison arm: 30 villages without ECD centre (not randomised); MNP-No, Parenting-No, SMC-Yes
Three cross-sectional endline surveys took place during the period May-August 2016, three years after the original MNP intervention began, and consisted of the following questionnaires and assessments in two age groups of children, 3 year olds and 5 year olds:
i) A household questionnaire was used to collect data from the primary adult caregiver of the child on home environment, exposure to the interventions, and reported practice outcomes of relevance to the parenting intervention.
ii) Biomedical outcomes were measured in children through laboratory and clinical assessment.
iii) A battery of tests were used to assess cognitive performance and school readiness in childen, using a different age-specific test battery for each age group adapted for local language and culture.
Note: Household and cognitive performance data were gathered from participants in all three arms. Biomedical data were only collected from children in the two randomised arms, to evaluate impact of MNP supplementation on anaemia (primary biomedical outcome) in children who received MNPs and those who did not, using a robust study design.
Districts (cercles) of Sikasso and Yorosso, Region of Sikasso
Individuals and communities
Random sample of target population for the intervention in the 90 communities that consented to participate in the trial, namely pre-school children 0-6 years.
Sample survey data [ssd]
The target population for the interventions comprised all children aged 3 months to 6 years, who were resident in the 90 study communities participating in the trial; the primary sampling unit is the individual child.
Sample Frame:
To identify the number of target beneficiaries, a complete census of all children of eligible age was carried out in the 90 study villages in August 2013. The census listing from 2013 thus defined the population of children who are eligible to have received the interventions every year for the three years between 2013-2016; and was used as the sampling frame of children in whom the impact after three years of implementation of the interventions was evaluated. The intention was to evaluate study outcomes in the same child one year after the start of the MNP intervention (May 2014) and again after three years of the intervention (2016).
A random sample of children was drawn from all children listed in the census for each community participating in the trial, according to the following age criteria:
Date of Birth, or Age in August 2013 (Age group in 2016 surveys) (i) Born between 1 Jan 2013 – 30 June 2013, or aged <1 year in 2013 census if DOB not known (3 years) (ii) Born between 1 May 2010 – 30 April 2011, or aged 2 years in census if DOB not known (5 years)
Thus, all children previously randomly selected and enrolled in the evaluation cohort in 2014 were, if still resident in the village and present on the day of the survey, re-surveyed in May 2016.
Sample Size:
Power analysis was undertaken for a comparison of two arms, taking account of clustering by community. Survey data on biomedical and cognitive outcomes collected in 2014 were used to inform sample size assumptions, including prevalence of primary outcomes, intraclass correlation (ICC) and number of children recruited per cluster. Prevalence of anaemia amongst 3-year old children in 2014 was found to be 61.6% and 64.0% in the intervention and control arms respectively (p=0.618) and 53.8% and 51.9% respectively amongst 5-year old children (p=0.582). The observed ICC for anaemia endpoint at baseline was 0.08 in 3-year old children and 0.06 in 5-year old children. Observed ICC for cognitive outcomes measured in 2014 was 0.09, ranging from 0.05 to 0.16 for individual tasks within the cognitive battery.
Sample Size Estimation for Health Outcomes:
Approximately 20-25 children per cluster were recruited into each age cohort in 2013. Power calculations for anaemia (primary endpoint) were undertaken for three alternative scenarios at endline: (i) to allow for the possibility of up to 20% loss to follow up between 2014 and 2016, power calculations were performed for a sample size at endline of 16 children per cluster; (ii) a smaller cluster size of 14 children sampled per village, under a scenario of 30% loss to follow-up; and (iii) unequal clusters, to allow for the possibility that variation in losses to follow-up between villages could result in an unequal number of children sampled in each village. In this case, cluster size is the mean number of children sampled per cluster.
Thus, assuming a conservative prevalence of anaemia of 50% in the control group and ICC of 0.08, a sample size of 30 communities per arm with 14-20 children sampled per community, will under all of these scenarios provide 80% power to detect a reduction in anemia of at least 28% at 5% level of significance.
Sample Size Estimation for Cognitive Outcomes:
Power calculations for cognitive outcomes explored: (i) a smaller cluster size of 14 children sampled per village, for example resulting from a higher than expected loss to follow-up of 30%; (ii) statistical analysis of differences between arms which does not adjust for baseline - a scenario which allows for the possibility to increase the sample size to compensate for losses to follow-up by increased recruitment of new children for whom no baseline data would be available; and (iii) effect of unequal clusters. Thus, for cognitive-linguistic skills, a sample size of 30 communities per arm with 14-20 children in each age cohort sampled per community will provide 80% power to detect an effect size between 0.27-0.29 at 5% level of significance, assuming an (ICC) of 0.10 and individual, household and community-level factors account for at least 25% of variation in cognitive foundation skills. Whilst for a similar sample size of 30 communities per arm with 14-20 children sampled per community and ICC of 0.10, a statistical analysis which does not adjust for baseline will provide 80% power to detect an effect size between 0.28-0.30 at 5% level of significance.
The sample at endline in May 2016 thus comprised a total of up to 600 children aged 3y and 600 children aged 5y at endline in each arm: T1 Intervention group (with ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y). C1 ECD control group (with ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y). C2 Comparison group (without ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y).
Strategy for Absent Respondents/Not Found/Refusals:
Every effort was made to trace children previously recruited into the evaluation cohort. Since some losses-to-follow-up (for example to due to child deaths, outward migration) were expected between 2014 and 2016, the primary strategy was to oversample in 2014. However, for villages where loss-to-follow-up was higher than expected and it was not possible to trace sufficient number of children remaining from the original sample to meet the required sample size per cluster, additional children were recruited into the evaluation survey in 2016. New recruits were selected at random from the children listed as resident in the village at the time of the original census in 2013. All new recruits had thus been resident in the village and exposed to the interventions throughout the three preceding years.
Face-to-face [f2f]
The questionnaires for the parent interview were structured questionnaires. A questionnaire was administered to the child’s primary caregiver
As a source of animal and plant population data, the Global Population Dynamics Database (GPDD) is unrivalled. Nearly five thousand separate time series are available here. In addition to all the population counts, there are taxonomic details of over 1400 species. The type of data contained in the GPDD varies enormously, from annual counts of mammals or birds at individual sampling sites, to weekly counts of zooplankton and other marine fauna. The project commenced in October 1994, following discussions on ways in which the collaborating partners could make a practical and enduring contribution to research into population dynamics. A small team was assembled and, with assistance and advice from numerous interested parties we decided to construct the database using the popular Microsoft Access platform. After an initial design phase, the major task has been that of locating, extracting, entering and validating the data in all the various tables. Now, nearly 5000 individual datasets have been entered onto the GPDD. The Global Population Dynamics Database comprises six Tables of data and information. The tables are linked to each other as shown in the diagram shown in figure 3 of the GPDD User Guide (GPDD-User-Guide.pdf). Referential integrity is maintained through record ID numbers which are held, along with other information in the Main Table. It's structure obeys all the rules of a standard relational database.
Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2018 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/modified-race-summary-file-method/mrsf2010.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2017) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/programs-surveys/popest.html.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of hosuehold types in 22 countries from IPUMS dataSee github repo for code. See pdf of poster for full factsheet.
PurposeAround 5% of United States (U.S.) population identifies as Sexual and Gender Diverse (SGD), yet there is limited research around cancer prevention among these populations. We present multi-pronged, low-cost, and systematic recruitment strategies used to reach SGD communities in New Mexico (NM), a state that is both largely rural and racially/ethnically classified as a “majority-minority” state.MethodsOur recruitment focused on using: (1) Every Door Direct Mail (EDDM) program, by the United States Postal Services (USPS); (2) Google and Facebook advertisements; (3) Organizational outreach via emails to publicly available SGD-friendly business contacts; (4) Personal outreach via flyers at clinical and community settings across NM. Guided by previous research, we provide detailed descriptions on using strategies to check for fraudulent and suspicious online responses, that ensure data integrity.ResultsA total of 27,369 flyers were distributed through the EDDM program and 436,177 impressions were made through the Google and Facebook ads. We received a total of 6,920 responses on the eligibility survey. For the 5,037 eligible respondents, we received 3,120 (61.9%) complete responses. Of these, 13% (406/3120) were fraudulent/suspicious based on research-informed criteria and were removed. Final analysis included 2,534 respondents, of which the majority (59.9%) reported hearing about the study from social media. Of the respondents, 49.5% were between 31-40 years, 39.5% were Black, Hispanic, or American Indian/Alaskan Native, and 45.9% had an annual household income below $50,000. Over half (55.3%) were assigned male, 40.4% were assigned female, and 4.3% were assigned intersex at birth. Transgender respondents made up 10.6% (n=267) of the respondents. In terms of sexual orientation, 54.1% (n=1371) reported being gay or lesbian, 30% (n=749) bisexual, and 15.8% (n=401) queer. A total of 756 (29.8%) respondents reported receiving a cancer diagnosis and among screen-eligible respondents, 66.2% reported ever having a Pap, 78.6% reported ever having a mammogram, and 84.1% reported ever having a colonoscopy. Over half of eligible respondents (58.7%) reported receiving Human Papillomavirus vaccinations.ConclusionStudy findings showcase effective strategies to reach communities, maximize data quality, and prevent the misrepresentation of data critical to improve health in SGD communities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.
PurposeAround 5% of United States (U.S.) population identifies as Sexual and Gender Diverse (SGD), yet there is limited research around cancer prevention among these populations. We present multi-pronged, low-cost, and systematic recruitment strategies used to reach SGD communities in New Mexico (NM), a state that is both largely rural and racially/ethnically classified as a “majority-minority” state.MethodsOur recruitment focused on using: (1) Every Door Direct Mail (EDDM) program, by the United States Postal Services (USPS); (2) Google and Facebook advertisements; (3) Organizational outreach via emails to publicly available SGD-friendly business contacts; (4) Personal outreach via flyers at clinical and community settings across NM. Guided by previous research, we provide detailed descriptions on using strategies to check for fraudulent and suspicious online responses, that ensure data integrity.ResultsA total of 27,369 flyers were distributed through the EDDM program and 436,177 impressions were made through the Google and Facebook ads. We received a total of 6,920 responses on the eligibility survey. For the 5,037 eligible respondents, we received 3,120 (61.9%) complete responses. Of these, 13% (406/3120) were fraudulent/suspicious based on research-informed criteria and were removed. Final analysis included 2,534 respondents, of which the majority (59.9%) reported hearing about the study from social media. Of the respondents, 49.5% were between 31-40 years, 39.5% were Black, Hispanic, or American Indian/Alaskan Native, and 45.9% had an annual household income below $50,000. Over half (55.3%) were assigned male, 40.4% were assigned female, and 4.3% were assigned intersex at birth. Transgender respondents made up 10.6% (n=267) of the respondents. In terms of sexual orientation, 54.1% (n=1371) reported being gay or lesbian, 30% (n=749) bisexual, and 15.8% (n=401) queer. A total of 756 (29.8%) respondents reported receiving a cancer diagnosis and among screen-eligible respondents, 66.2% reported ever having a Pap, 78.6% reported ever having a mammogram, and 84.1% reported ever having a colonoscopy. Over half of eligible respondents (58.7%) reported receiving Human Papillomavirus vaccinations.ConclusionStudy findings showcase effective strategies to reach communities, maximize data quality, and prevent the misrepresentation of data critical to improve health in SGD communities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the Richmond, VA population pyramid, which represents the Richmond population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Richmond Population by Age. You can refer the same here
Genomics can reveal essential features about the demographic evolution of a population that may not be apparent from historical elements. In recent years, there has been a significant increase in the number of studies applying genomic epidemiological approaches to understand the genetic structure and diversity of human populations in the context of demographic history and for implementing precision medicine. These efforts have traditionally been applied predominantly to populations of European origin. More recently, initiatives in the United States and Africa are including more diverse populations, establishing new horizons for research in human populations with African and/or Native ancestries. Still, even in the most recent projects, the under-representation of genomic data from Latin America and the Caribbean (LAC) is remarkable. In addition, because the region presents the most recent global miscegenation, genomics data from LAC may add relevant information to understand population admixture better. Admixture in LAC started during the colonial period, in the 15th century, with intense miscegenation between European settlers, mainly from Portugal and Spain, with local indigenous and sub-Saharan Africans brought through the slave trade. Since, there are descendants of formerly enslaved and Native American populations in the LAC territory; they are considered vulnerable populations because of their history and current living conditions. In this context, studying LAC Native American and African descendant populations is important for several reasons. First, studying human populations from different origins makes it possible to understand the diversity of the human genome better. Second, it also has an immediate application to these populations, such as empowering communities with the knowledge of their ancestral origins. Furthermore, because knowledge of the population genomic structure is an essential requirement for implementing genomic medicine and precision health practices, population genomics studies may ensure that these communities have access to genomic information for risk assessment, prevention, and the delivery of optimized treatment; thus, helping to reduce inequalities in the Western Hemisphere. Hoping to set the stage for future studies, we review different aspects related to genetic and genomic research in vulnerable populations from LAC countries.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset provides data on the prevalence of normal weight, overweight, and obesity among adults aged 20 and over, segmented by various population characteristics. The data is sourced from the National Health and Nutrition Examination Survey (NHANES) conducted by the National Center for Health Statistics (NCHS). This dataset is invaluable for understanding the distribution and trends of weight-related health metrics across different demographics in the United States.
Source: - National Health and Nutrition Examination Survey (NHANES): Conducted by NCHS. - Supporting Documentation: Refer to the HUS 2019 Data Finder for detailed definitions, measures, and changes over time. - Appendix Entry: Additional information available in the corresponding Appendix entry.
Source URLs: - HUS 2019 Data Finder - Appendix Entry - Data.gov Dataset
This dataset includes data collected over multiple time periods, providing insights into the weight distribution among adults aged 20 and over. Key features include segmentation by sex and specific age ranges.
Column Name | Description |
---|---|
INDICATOR | Indicator for the data type, e.g., Normal weight |
PANEL | Panel identifier for the survey |
PANEL_NU | Numerical value representing the panel |
UNIT | Unit of measurement, e.g., Percent of population |
UNIT_NU | Numerical value representing the unit |
STUB_NA | Stub name for category, e.g., Total |
STUB_LA | Label for the stub category, e.g., All persons |
YEAR | The year or period the data was recorded |
YEAR_NUM | Numerical value representing the year or period |
AGE | Age group category, e.g., 20 years and over |
AGE_NUM | Numerical value representing the age group |
ESTIMATE | Estimated percentage |
SE | Standard error of the estimate |
Data Source: CA Department of Finance, Demographic Research Unit
Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.
This data biography shares the how, who, what, where, when, and why about this dataset. We, the epidemiology team at Napa County Health and Human Services Agency, Public Health Division, created it to help you understand where the data we analyze and share comes from. If you have any further questions, we can be reached at epidemiology@countyofnapa.org.
Data dashboard featuring this data: Napa County Demographics https://data.countyofnapa.org/stories/s/bu3n-fytj
How was the data collected? Population projections use the following demographic balancing equation: Current Population = Previous Population + (Births - Deaths) +Net Migration
Previous Population: the starting point for the population projection estimates is the 2020 US Census, informed by the Population Estimates Program data.
Births and Deaths: birth and death totals came from the California Department of Public Health, Vital Statistics Branch, which maintains birth and death records for California.
Net Migration: multiple sources of administrative records were used to estimate net migration, including driver’s license address changes, IRS tax return data, Medicare and Medi-Cal enrollment, federal immigration reports, elementary school enrollments, and group quarters population.
Who was included and excluded from the data? Previous Population: The goal of the US Census is to reflect all populations residing in a given geographic area. Results of two analyses done by the US Census Bureau showed that the 2020 Census total population counts were consistent with recent counts despite the challenges added by the pandemic. However, some populations were undercounted (the Black or African American population, the American Indian or Alaska Native population living on a reservation, the Hispanic or Latino population, and people who reported being of Some Other Race), and some were overcounted (the Non-Hispanic White population and the Asian population). Children, especially children younger than 4, were also undercounted.
Births and Deaths: Birth records include all people who are born in California as well as births to California residents that happened out of state. Death records include people who died while in California, as well as deaths of California residents that occurred out of state. Because birth and death record data comes from a registration process, the demographic information provided may not be accurate or complete.
Net Migration: each of the multiple sources of administrative records that were used to estimate net migration include and exclude different groups. For details about methodology, see https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf.
Where was the data collected? Data is collected throughout California. This subset of data includes Napa County.
When was the data collected? This subset of Napa County data is from Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.
These 2019 baseline projections incorporate the latest historical population, birth, death, and migration data available as of July 1, 2020. Historical trends from 1990 through 2020 for births, deaths, and migration are examined. County populations by age, sex, and race/ethnicity are projected to 2060.
Why was the data collected? The population projections were prepared under the mandate of the California Government Code (Cal. Gov't Code § 13073, 13073.5).
Where can I learn more about this data? https://dof.ca.gov/Forecasting/Demographics/Projections/ https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/P3_Dictionary.txt https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clinical research is pivotal in assessing the safety and efficacy of new treatments in healthcare. However, the success of such research depends on the inclusion of a diverse and representative participant sample, which is currently lacking. This lack of diversity in biomedical research participants has significant repercussions, limiting the real-world applicability and accessibility of medical interventions, especially for underrepresented groups. Barriers to diverse participation include historical mistrust, logistical challenges, and financial constraints. Recent guidelines by government agencies and funding bodies emphasize the need for diversity in clinical trials, but specific strategies for inclusive recruitment are often lacking. This paper explores the use of digital methods to enhance diversity and inclusion in research recruitment. Digital tools, such as electronic medical records, social media, research registries, and mobile applications, offer promising opportunities for reaching diverse populations. Strategies include culturally tailored messaging, collaborations with community organizations, and the use of SEO to improve visibility and engagement. However, challenges such as privacy concerns, digital literacy gaps, and ethical considerations must be addressed. The promotion of diversity in clinical research recruitment is crucial for advancing health equity. By leveraging digital tools and adopting inclusive strategies, study teams can improve the diversity of study participants, ultimately leading to more applicable and equitable healthcare outcomes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 95
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Data supports Working Paper 681, "Interstate Migration Has Fallen Less Than You Think: Consequences of Hot Deck Imputation in the Current Population Survey." https://www.minneapolisfed.org/research/wp/wp681.pdf
Description: The 2005 HSRC Master Sample was used for SABSSM 2008 and 2012, the SANHANES study in 2012 and SASAS 2007-2010 (adjacent EAs) to obtain an understanding of geographical spread of HIV/AIDS, perceptions and attitudes of people and other health related studies over time. Abstract: A sample can be defined as a subset containing the characteristics of a larger population. Samples are used in statistical testing when population sizes are too large for the test to include all possible members or observations. A sample should represent the whole population and not reflect bias toward a specific attribute.[1] One of the most crucial aspects of sample design in household surveys is its frame. The sampling frame has significant implications on the cost and the quality of any survey, household or otherwise.[2] The sampling frame .... in a household survey must cover the entire target population. When that frame is used for multiple surveys or multiple rounds of the same survey it is known as a master sample frame or .... master sample.[3] A master sample is a sample drawn from a population for use on a number of future occasions, so as to avoid ad hoc sampling on each occasion. Sometimes the master sample is large and subsequent inquiries are based on a sub-sample from it.[4] The HSRC compiles master samples in order to construct samples for various HSRC research studies. The 2005 HSRC Master Sample was used for SABSSM 2008 and 2012, SASAS 2007-2010 and the SANHANES study in 2012 to obtain an understanding of geographical spread of HIV/AIDS, perceptions and attitudes of people and other health related studies over time. The 2005 HSRC Master Sample was created in the following way: South Africa was delineated into EAs according to municipality and province. Municipal boundaries were obtained from the Municipal Demarcation Board. An Enumeration area (EA) is the smallest geographical unit (piece of land) into which the country is divided for census or survey enumeration.[5] The concepts and definitions of terms used for Census 2001 comply in most instances with United Nations standards for censuses. A total of 1,000 census enumeration areas (EAs) from the 2001 population census were randomly selected using probability proportional to size and stratified by province, locality type and race in urban areas from a database of 80 787 EAs that were mapped using aerial photography to develop an HSRC master sample for selecting households. The ideal frame would be complete with respect to the target population if all of its members (the universe) are covered by the frame. Ideal characteristics of a master sample: The master frame should be as complete, accurate and current as practicable. A master sample frame for household surveys is typically developed from the most recent census, just as a regular sample frame is. Because the master frame may be used during an entire intercensal (between census) period, however, it will usually require periodic and regular updating such as every 2-3 years. This is in contrast to a regular frame which is more likely to be up-dated on an ad hoc basis and only when a particular survey is being planned[6] [1] http://www.investopedia.com/terms/s/sample.asp [2] http://unstats.un.org/unsd/demographic/meetings/egm/sampling_1203/docs/no_3.pdf [3] http://unstats.un.org/unsd/demographic/meetings/egm/sampling_1203/docs/no_3.pdf [4] A Dictionary of Statistical Terms, 5th edition, prepared for the International Statistical Institute by F.H.C. Marriott. Published for the International Statistical Institute by Longman Scientific and Technical. http://stats.oecd.org/glossary/detail.asp?ID=3708 [5] http://africageodownloads.info/128_mokgokolo.pdf [6] http://unstats.un.org/unsd/demographic/meetings/egm/sampling_1203/docs/no_3.pdf All enumeration areas (80 787 EAs) within the South African borders during the 2001 Census. The whole country was delimited into EAs according to municipality and province. Municipal boundaries were obtained from the Municipal Demarcation Board. A total of 1,000 census enumeration areas (EAs) from the 2001 population census were randomly selected using probability proportional to size and stratified by province, locality type and race in urban areas from a database of 80 787 EAs that were mapped in all surveys using aerial photography to develop all HSRC master sample for selecting households. The first digit represents the province The second and third digits represent the municipality
PurposeAround 5% of United States (U.S.) population identifies as Sexual and Gender Diverse (SGD), yet there is limited research around cancer prevention among these populations. We present multi-pronged, low-cost, and systematic recruitment strategies used to reach SGD communities in New Mexico (NM), a state that is both largely rural and racially/ethnically classified as a “majority-minority” state.MethodsOur recruitment focused on using: (1) Every Door Direct Mail (EDDM) program, by the United States Postal Services (USPS); (2) Google and Facebook advertisements; (3) Organizational outreach via emails to publicly available SGD-friendly business contacts; (4) Personal outreach via flyers at clinical and community settings across NM. Guided by previous research, we provide detailed descriptions on using strategies to check for fraudulent and suspicious online responses, that ensure data integrity.ResultsA total of 27,369 flyers were distributed through the EDDM program and 436,177 impressions were made through the Google and Facebook ads. We received a total of 6,920 responses on the eligibility survey. For the 5,037 eligible respondents, we received 3,120 (61.9%) complete responses. Of these, 13% (406/3120) were fraudulent/suspicious based on research-informed criteria and were removed. Final analysis included 2,534 respondents, of which the majority (59.9%) reported hearing about the study from social media. Of the respondents, 49.5% were between 31-40 years, 39.5% were Black, Hispanic, or American Indian/Alaskan Native, and 45.9% had an annual household income below $50,000. Over half (55.3%) were assigned male, 40.4% were assigned female, and 4.3% were assigned intersex at birth. Transgender respondents made up 10.6% (n=267) of the respondents. In terms of sexual orientation, 54.1% (n=1371) reported being gay or lesbian, 30% (n=749) bisexual, and 15.8% (n=401) queer. A total of 756 (29.8%) respondents reported receiving a cancer diagnosis and among screen-eligible respondents, 66.2% reported ever having a Pap, 78.6% reported ever having a mammogram, and 84.1% reported ever having a colonoscopy. Over half of eligible respondents (58.7%) reported receiving Human Papillomavirus vaccinations.ConclusionStudy findings showcase effective strategies to reach communities, maximize data quality, and prevent the misrepresentation of data critical to improve health in SGD communities.
2000 Population Census Data for Baltimore, Maryland. Refer to the 2000.pdf enclosures for more information. This is part of a collection of 221 Baltimore Ecosystem Study metadata records that point to a geodatabase. The geodatabase is available online and is considerably large. Upon request, and under certain arrangements, it can be shipped on media, such as a usb hard drive. The geodatabase is roughly 51.4 Gb in size, consisting of 4,914 files in 160 folders. Although this metadata record and the others like it are not rich with attributes, it is nonetheless made available because the data that it represents could be indeed useful.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Open Science in (Higher) Education – data of the February 2017 survey
This data set contains:
Survey structure
The survey includes 24 questions and its structure can be separated in five major themes: material used in courses (5), OER awareness, usage and development (6), collaborative tools used in courses (2), assessment and participation options (5), demographics (4). The last two questions include an open text questions about general issues on the topics and singular open education experiences, and a request on forwarding the respondent’s e-mail address for further questionings. The online survey was created with Limesurvey[1]. Several questions include filters, i.e. these questions were only shown if a participants did choose a specific answer beforehand ([n/a] in Excel file, [.] In SPSS).
Demographic questions
Demographic questions asked about the current position, the discipline, birth year and gender. The classification of research disciplines was adapted to general disciplines at German higher education institutions. As we wanted to have a broad classification, we summarised several disciplines and came up with the following list, including the option “other” for respondents who do not feel confident with the proposed classification:
The current job position classification was also chosen according to common positions in Germany, including positions with a teaching responsibility at higher education institutions. Here, we also included the option “other” for respondents who do not feel confident with the proposed classification:
We chose to have a free text (numerical) for asking about a respondent’s year of birth because we did not want to pre-classify respondents’ age intervals. It leaves us options to have different analysis on answers and possible correlations to the respondents’ age. Asking about the country was left out as the survey was designed for academics in Germany.
Remark on OER question
Data from earlier surveys revealed that academics suffer confusion about the proper definition of OER[2]. Some seem to understand OER as free resources, or only refer to open source software (Allen & Seaman, 2016, p. 11). Allen and Seaman (2016) decided to give a broad explanation of OER, avoiding details to not tempt the participant to claim “aware”. Thus, there is a danger of having a bias when giving an explanation. We decided not to give an explanation, but keep this question simple. We assume that either someone knows about OER or not. If they had not heard of the term before, they do not probably use OER (at least not consciously) or create them.
Data collection
The target group of the survey was academics at German institutions of higher education, mainly universities and universities of applied sciences. To reach them we sent the survey to diverse institutional-intern and extern mailing lists and via personal contacts. Included lists were discipline-based lists, lists deriving from higher education and higher education didactic communities as well as lists from open science and OER communities. Additionally, personal e-mails were sent to presidents and contact persons from those communities, and Twitter was used to spread the survey.
The survey was online from Feb 6th to March 3rd 2017, e-mails were mainly sent at the beginning and around mid-term.
Data clearance
We got 360 responses, whereof Limesurvey counted 208 completes and 152 incompletes. Two responses were marked as incomplete, but after checking them turned out to be complete, and we added them to the complete responses dataset. Thus, this data set includes 210 complete responses. From those 150 incomplete responses, 58 respondents did not answer 1st question, 40 respondents discontinued after 1st question. Data shows a constant decline in response answers, we did not detect any striking survey question with a high dropout rate. We deleted incomplete responses and they are not in this data set.
Due to data privacy reasons, we deleted seven variables automatically assigned by Limesurvey: submitdate, lastpage, startlanguage, startdate, datestamp, ipaddr, refurl. We also deleted answers to question No 24 (email address).
References
Allen, E., & Seaman, J. (2016). Opening the Textbook: Educational Resources in U.S. Higher Education, 2015-16.
First results of the survey are presented in the poster:
Heck, Tamara, Blümel, Ina, Heller, Lambert, Mazarakis, Athanasios, Peters, Isabella, Scherp, Ansgar, & Weisel, Luzian. (2017). Survey: Open Science in Higher Education. Zenodo. http://doi.org/10.5281/zenodo.400561
Contact:
Open Science in (Higher) Education working group, see http://www.leibniz-science20.de/forschung/projekte/laufende-projekte/open-science-in-higher-education/.
[1] https://www.limesurvey.org
[2] The survey question about the awareness of OER gave a broad explanation, avoiding details to not tempt the participant to claim “aware”.
As of February 2025, 5.56 billion individuals worldwide were internet users, which amounted to 67.9 percent of the global population. Of this total, 5.24 billion, or 63.9 percent of the world's population, were social media users. Global internet usage Connecting billions of people worldwide, the internet is a core pillar of the modern information society. Northern Europe ranked first among worldwide regions by the share of the population using the internet in 20254. In The Netherlands, Norway and Saudi Arabia, 99 percent of the population used the internet as of February 2025. North Korea was at the opposite end of the spectrum, with virtually no internet usage penetration among the general population, ranking last worldwide. Eastern Asia was home to the largest number of online users worldwide – over 1.34 billion at the latest count. Southern Asia ranked second, with around 1.2 billion internet users. China, India, and the United States rank ahead of other countries worldwide by the number of internet users. Worldwide internet user demographics As of 2024, the share of female internet users worldwide was 65 percent, five percent less than that of men. Gender disparity in internet usage was bigger in African countries, with around a ten percent difference. Worldwide regions, like the Commonwealth of Independent States and Europe, showed a smaller usage gap between these two genders. As of 2024, global internet usage was higher among individuals between 15 and 24 years old across all regions, with young people in Europe representing the most significant usage penetration, 98 percent. In comparison, the worldwide average for the age group 15–24 years was 79 percent. The income level of the countries was also an essential factor for internet access, as 93 percent of the population of the countries with high income reportedly used the internet, as opposed to only 27 percent of the low-income markets.
The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.
The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).
The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.
A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.
Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.