Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
National coverage
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 829 households. At the time of the survey, the population of concern was estimated at around 500000 individuals.
Other [oth]
The questionnaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity, well-being, COVID-19 Impact.
This statistic shows the results of a survey on the level of interest in quiz shows on television in Germany from 2019 to 2023. In 2023, 12.7 million Germans were highly interested in watching quiz shows on TV.The Allensbach Market and Advertising Media Analysis (Allensbacher Markt- und Werbeträgeranalyse or AWA in German) determines attitudes, consumer habits and media usage of the population in Germany on a broad statistical basis
In 2023, around 31.6 percent of women and 20 percent of men in Germany were especially interested in information about social interactions and psychology. This data is based on a survey conducted in Germany that year. The Allensbach Market and Advertising Media Analysis (Allensbacher Markt- und Werbeträgeranalyse or AWA in German) determines attitudes, consumer habits and media usage of the population in Germany on a broad statistical basis.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
Whole country
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 236 households. At the time of the survey, the population of concern was estimated at around 1600000 individuals.
Other [oth]
Questionaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity,well-being,COVID-19 Impact.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest’s demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
National coverage
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 183 refugee households.
Other [oth]
The questionnaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity, well-being, COVID-19 Impact.
https://borealisdata.ca/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.5683/SP3/9TET2Thttps://borealisdata.ca/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.5683/SP3/9TET2T
This new product will present data for specific census topics and population groups according to selected demographic, cultural, and socio-economic characteristics. These detailed 'profile-type' tables expand the analytical depth of basic census information. Special interest profiles include: ethnic groups, Aboriginal peoples, occupation, industry, and place of work.
This statistic shows the results of a survey on the level of interest in watching the news on television in Germany from 2018 to 2023. In 2023, 23.47 million Germans were highly interested in watching the news on TV.The Allensbach Market and Advertising Media Analysis (Allensbacher Markt- und Werbeträgeranalyse or AWA in German) determines attitudes, consumer habits and media usage of the population in Germany on a broad statistical basis.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can download data or view data tables on topics related to the labor force of the United States. Background Current Population Survey is a joint effort between the Bureau of Labor Statistics and the Census Bureau. It provides information and data on the labor force of the United States, such as: employment, unemployment, earnings, hours of work, school enrollment, health, employee benefits and income. The CPS is conducted monthly and has a sample of approximately 50,000 households. It is representative of the non-institutionalized US population. The sample provides estimates for the nation as a whole and serves as part of model-based estimates for individual states and other geographic areas. User Functionality Users can download data sets or view data tables on their topic of interest. Data can be organized by a variety of demographic variables, including: sex, age, race, marital status and educational attainment. Data is available on a national or state level. Data Notes The CPS is conducted monthly and has a sample of approximately 50,000 households. It is representative of the non-institutionalized US population. The sample provides estimates for th e nation as a whole and serves as part of model-based estimates for individual states and other geographic areas.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
National coverage
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 129 households. At the time of the survey, the population of concern was estimated at around 11000 individuals.
Other [oth]
The questionnaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity, well-being, COVID-19 Impact.
All taxpayers with investment income have been sorted into deciles by the total value of their investment income
HMRC holds comprehensive income information for taxpayers but not for the rest of the UK population. Therefore the SPI is not a suitably representative data source for non-taxpayers and no attempt has been made to estimate the number of non-taxpayers or the amount of their income. Some interest and dividend income data may be incomplete and it is necessary to impute these amounts in a manner consistent with information from external survey data and the National Accounts.
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
This dataset is the definitive version of the annually released statistical area 1 (SA1) boundaries as at 1 January 2025, as defined by Stats NZ. This version contains 33,164 SA1s (33,148 digitised and 16 with empty or null geometries (non-digitised)).
SA1 is an output geography that allows the release of more low-level data than is available at the meshblock level. Built by joining meshblocks, SA1s have an ideal size range of 100–200 residents, and a maximum population of about 500. This is to minimise suppression of population data in multivariate statistics tables.
The SA1 should:
form a contiguous cluster of one or more meshblocks,
be either urban, rural, or water in character,
be small enough to:
allow flexibility for aggregation to other statistical geographies,
allow users to aggregate areas into their own defined communities of interest,
form a nested hierarchy with statistical output geographies and administrative boundaries. It must:
be built from meshblocks,
either define or aggregate to define SA2s, urban rural areas, territorial authorities, and regional councils.
SA1s generally have a population of 100–200 residents, with some exceptions:
SA1s with nil or nominal resident populations are created to represent remote mainland areas, unpopulated islands, inland water, inlets, or oceanic areas.
Some SA1s in remote rural areas and urban industrial or business areas have fewer than 100 residents.
Some SA1s that contain apartment blocks, retirement villages, and large non-residential facilities (prisons, boarding schools, etc.) have more than 500 residents.
SA1 numbering
SA1s are not named. SA1 codes have seven digits starting with a 7 and are numbered approximately north to south. Non-digitised codes start with 79.
As new SA1s are created, they are given the next available numeric code. If the composition of an SA1 changes through splitting or amalgamating different meshblocks, the SA1 is given a new code. The previous code no longer exists within that version and future versions of the SA1 classification.
Digitised and non-digitised SA1s
The digital geographic boundaries are defined and maintained by Stats NZ.
Aggregated from meshblocks, SA1s cover the land area of New Zealand, the water area to the 12-mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, off-shore oil rigs, and Ross Dependency. The following 16 SA1s are held in non-digitised form.
7999901; New Zealand Economic Zone, 7999902; Oceanic Kermadec Islands,7999903; Kermadec Islands, 7999904; Oceanic Oil Rig Taranaki,7999905; Oceanic Campbell Island, 7999906; Campbell Island, 7999907; Oceanic Oil Rig Southland, 7999908; Oceanic Auckland Islands, 7999909; Auckland Islands, 7999910; Oceanic Bounty Islands, 7999911; Bounty Islands, 7999912; Oceanic Snares Islands, 7999913; Snares Islands, 7999914; Oceanic Antipodes Islands, 7999915; Antipodes Islands, 7999916; Ross Dependency.
High-definition version
This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre.
Macrons
Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’.
Digital data
Digital boundary data became freely available on 1 July 2007.
Further information
To download geographic classifications in table formats such as CSV please use Ariā
For more information please refer to the Statistical standard for geographic areas 2023.
Contact: geography@stats.govt.nz
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest’s demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
National coverage
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 388 refugee households.
Other [oth]
The questionnaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity, well-being, COVID-19 Impact.
This statistic shows the results of a survey on the level of interest in action series and movies on television in Germany from 2019 to 2023. In 2023, 12.43 million Germans were highly interested in watching action, adventure, thriller, horror and war series or movies on TV.The Allensbach Market and Advertising Media Analysis (Allensbacher Markt- und Werbeträgeranalyse or AWA in German) determines attitudes, consumer habits and media usage of the population in Germany on a broad statistical basis.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
National coverage
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 79 households. At the time of the survey, the population of concern was estimated at around 25000 individuals.
Other [oth]
The questionnaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity, well-being, COVID-19 Impact.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
Whole country
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 4121 households. At the time of the survey, the population of concern was estimated at around 110000 individuals.
Other [oth]
Questionaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity,well-being,COVID-19 Impact.
Survey research can generate knowledge that is central to the study of collective action, public opinion, and political participation. Unfortunately, many populations—from undocumented migrants to right-wing activists and oligarchs—are hidden, lack sampling frames, or are otherwise hard to survey. An approach to hard-to-survey populations commonly taken by researchers in other disciplines is largely missing from the toolbox of political science methods: respondent-driven sampling (RDS). By leveraging relations of trust, RDS accesses hard-to-survey populations; it also promotes representativeness, systematizes data collection, and, notably, supports population inference. In approximating probability sampling, RDS makes strong assumptions. Yet if strengthened by integrative multi-method research, the method can shed light on otherwise concealed—and critical—political preferences and behaviors among many populations of interest. Through describing one of the first correct applications of RDS in political science, this paper provides empirically grounded guidance via a study of activist refugees from Syria. Refugees are prototypical hard-to-survey populations, and mobilized ones even more so; yet the study demonstrates that RDS can provide a systematic and representative account of a vulnerable population engaged in major political phenomena.
This data collection comprises a data library, sample outputs, batch files and accompanying documentation from the ESRC-funded project “Population247NRT: Near real-time spatiotemporal population estimates for health, emergency response and national security”. The data comprise a structured set of input data for use with the authors’ SurfaceBuilder247 software and sample outputs which estimate the population distribution of England at specific times on specific dates, referenced to 2011 census population totals. The sample output files (provided as GeoTIFFs) contain population estimates in 200m grid cells, based on the British National Grid, for 02:00 (2am) and 14:00 (2pm) on a typical weekday in University and school term-time and out of term-time. The estimates are broken down by seven age/economic activity sub-groups for term-time and six for out of term-time, and include estimates of population activity in residential, workplace, education, healthcare and road transportation domains. The data library, which has been constructed entirely using open data sources, comprises population estimates, by age/economic activity sub-groups, for point locations (typically population-weighted centroids of census output areas and workplace zones, or postcode centroids of sites such as schools or hospitals); time profiles representing usual patterns of population activity at these sites during a 24-hour period; and background grid layers representing the land surface area and major road network. SurfaceBuilder247 uses the data library to generate time-specific gridded population estimates by redistributing the population of each sub-group across the available locations and background grid in accordance with the reference time profiles. The sample output grids provided in this resource may be used directly in GIS software or, alternatively, the input data library may be reprocessed using SurfaceBuilder247 to generate estimates for specific dates and times of interest to the user. Sample batch and session parameter files are included in the resource.Decision-making and policy formulation in sectors such as health, emergency/crisis response and national security, ideally require accurate dynamic information on the number of people in specific places at specific times of the day, week, season or year. Traditional census data do not provide this level of detail but are often used for such policy and planning purposes. The ESRC-funded Population247 programme of research (Martin et al, 2015) developed a framework, methodology and software tool (SurfaceBuilder247) for integrating diverse contemporary data sources to produce enhanced time-specific population estimates for small geographical areas. Its usefulness has since been demonstrated for flooding and radiation emergency response/planning, through collaborations with HR Wallingford and Public Health England. These models have primarily involved the integration of open administrative data for activities such as place of residence, work, education and health. Now, new and emerging forms of data, such as sensor data, live and static data feeds provided via the internet, and various commercial datasets which were not previously available, provide exciting opportunities to enhance these population estimates. Such new and emerging datasets are useful because they provide near real-time information on population activity in sectors which are particularly dynamic and have previously been difficult to model, such as retail, leisure and transport. However, extracting useful intelligence from these sources, and integrating and calibrating them with existing data sources, poses significant challenges for researchers and practitioners seeking to employ them in the creation of time-specific population estimates. This project will combine new, emerging and existing datasets in order to produce enhanced time-specific population estimates for more informed decision-making and policy formulation in the health, emergency/crisis response and national security sectors. It is a collaborative project between University of Southampton, Public Health England (PHE), Health and Safety Executive (HSE) and Defence Science and Technology Laboratory (Dstl). The project will enhance existing methods and tools for harvesting, processing, integrating and calibrating new, emerging and existing data sources in order to produce time-specific population estimates. It will deliver two substantive policy demonstrator case studies with the project partners. The first case study will demonstrate the potential for using time-specific population estimates for near real-time response in emergencies; the second will explore their usefulness for modelling variation in 'normal' population distributions through space and time in order to inform longer-term planning and policy formulation. Importantly, the project will also encourage the sharing of knowledge and expertise between academia and the public sector through joint design and implementation of the case studies, internal seminars and a jointly organised stakeholder workshop. Invitees to the workshop will be key stakeholders in policy and practice from within and beyond the partners' sectors. The workshop will showcase the data, methods and tools developed by the project, discuss the opportunities and challenges involved in implementing these for decision-making and policy formulation, and identify how such methods might realistically be scaled up within these sectors. Ultimately, the aim of the project is to help partners such as PHE, HSE and Dstl carry out their remits more effectively and efficiently through the provision of better time-specific population estimates. The data library and sample output files provided in this data collection have been generated by processing a range of open data sources including residential and workplace populations from the 2011 Census, school and college pupil numbers from the school census and services such as the government’s ‘Get Information About Schools’, university student numbers from the Higher Education Statistics Agency, hospital patient numbers and attendance time profiles from NHS Digital, road traffic estimates from the Department for Transport National Transportation Model, and GIS road network, inland water and coastline layers from Ordnance Survey and the Office for National Statistics. Information from the 2015 Time Use Survey has been used in the estimation of typical time profiles for workplace activities. GIS processing has been undertaken to estimate typical catchment area sizes for locations such as schools and hospitals. The principal input data are population counts for 2011 census output areas in England, which determine the base populations of all the estimates produced. The project team have georeferenced, reformatted and integrated all the input sources to create an input data library for the SurfaceBuilder247 software. All the necessary input files are provided, together with sample outputs for selected times of interest.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2010, the 2010 Census provides the official counts of the population and housing units for the nation, states, counties, cities and towns. For 2006 to 2009, the Population Estimates Program provides intercensal estimates of the population for the nation, states, and counties..Explanation of Symbols:.An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2000 data. Boundaries for urban areas have not been updated since Census 2000. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2006-2010 American Community Survey (ACS) data generally reflect the December 2009 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2006-2010 American Community Survey
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We would like to inform you that the updated GlobPOP dataset (2021-2022) have been available in version 2.0. The GlobPOP dataset (2021-2022) in the current version is not recommended for your work. The GlobPOP dataset (1990-2020) in the current version is the same as version 1.0.
Thank you for your continued support of the GlobPOP.
If you encounter any issues, please contact us via email at lulingliu@mail.bnu.edu.cn.
Continuously monitoring global population spatial dynamics is essential for implementing effective policies related to sustainable development, such as epidemiology, urban planning, and global inequality.
Here, we present GlobPOP, a new continuous global gridded population product with a high-precision spatial resolution of 30 arcseconds from 1990 to 2020. Our data-fusion framework is based on cluster analysis and statistical learning approaches, which intends to fuse the existing five products(Global Human Settlements Layer Population (GHS-POP), Global Rural Urban Mapping Project (GRUMP), Gridded Population of the World Version 4 (GPWv4), LandScan Population datasets and WorldPop datasets to a new continuous global gridded population (GlobPOP). The spatial validation results demonstrate that the GlobPOP dataset is highly accurate. To validate the temporal accuracy of GlobPOP at the country level, we have developed an interactive web application, accessible at https://globpop.shinyapps.io/GlobPOP/, where data users can explore the country-level population time-series curves of interest and compare them with census data.
With the availability of GlobPOP dataset in both population count and population density formats, researchers and policymakers can leverage our dataset to conduct time-series analysis of population and explore the spatial patterns of population development at various scales, ranging from national to city level.
The product is produced in 30 arc-seconds resolution(approximately 1km in equator) and is made available in GeoTIFF format. There are two population formats, one is the 'Count'(Population count per grid) and another is the 'Density'(Population count per square kilometer each grid)
Each GeoTIFF filename has 5 fields that are separated by an underscore "_". A filename extension follows these fields. The fields are described below with the example filename:
GlobPOP_Count_30arc_1990_I32
Field 1: GlobPOP(Global gridded population)
Field 2: Pixel unit is population "Count" or population "Density"
Field 3: Spatial resolution is 30 arc seconds
Field 4: Year "1990"
Field 5: Data type is I32(Int 32) or F32(Float32)
Please refer to the paper for detailed information:
Liu, L., Cao, X., Li, S. et al. A 31-year (1990–2020) global gridded population dataset generated by cluster analysis and statistical learning. Sci Data 11, 124 (2024). https://doi.org/10.1038/s41597-024-02913-0.
The fully reproducible codes are publicly available at GitHub: https://github.com/lulingliu/GlobPOP.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.