Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data in social and behavioral sciences are routinely collected using questionnaires, and each domain of interest is tapped by multiple indicators. Structural equation modeling (SEM) is one of the most widely used methods to analyze such data. However, conventional methods for SEM face difficulty when the number of variables (p) is large even when the sample size (N) is also rather large. This article addresses the issue of model inference with the likelihood ratio statistic Tml. Using the method of empirical modeling, mean-and-variance corrected statistics for SEM with many variables are developed. Results show that the new statistics not only perform much better than Tml but also are substantial improvements over other corrections to Tml. When combined with a robust transformation, the new statistics also perform well with non-normally distributed data.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
The General Household Survey (GHS) was a continuous national survey of people living in private households conducted on an annual basis, by the Social Survey Division of the Office for National Statistics (ONS). The main aim of the survey was to collect data on a range of core topics, covering household, family and individual information. This information was used by government departments and other organisations for planning, policy and monitoring purposes, and to present a picture of households, family and people in Great Britain. From 2008, the General Household Survey became a module of the Integrated Household Survey (IHS). In recognition, the survey was renamed the General Lifestyle Survey (GLF). The GLF closed in 2011.
Secure Access GLF
The Secure Access version includes additional, detailed variables not included in either the standard 'End User Licence' (EUL) version (see under GN 33090). Not all variables are available for all years, but extra variables that can typically be found in the Secure Access version but not in the EUL version relate to:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The objective of this study was to verify possible differences in performance between boys and girls in the Zulliger Comprehensive System (ZSC). The sample consisted of 623 children aged from 6 to 14, from the Southeast region of Brazil, divided into four age groups: six to seven years, eight to nine, ten to eleven, and twelve to fourteen years. The means were compared using the t-test. The results indicated that some differences remained significant even after the Bonferroni correction, although the number of variables was reduced considerably when compared to the literature. The findings are discussed together with studies with projective techniques as well as other personality techniques. It was concluded that, although many variables were corroborated in the literature, more studies with more homogenous samples are needed, including, for example, control for the cognitive level and sociodemographic variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a collection of two datasets: one sourced from CPM data (bham_gcmx-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season.tar.gz) and one sourced from GCM data (bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season.tar.gz). Each dataset is made up of climate model variables extracted from the Met Office's storage system, combining many variables over many years. It consists of 3 NetCDF files (train.nc, test.nc and val.nc), a YML ds-config.yml file and a README (similar to this one but tailored to the source of the data). Code used to create the dataset can be found here: https://github.com/henryaddison/mlde-data (specifically the v0.1.0 tag: https://github.com/henryaddison/mlde-data/tree/v0.1.0).
The YML file contains the configuration for the creation of the dataset, including the variables, scenario, ensemble members, spatial domain and resolution, and the scheme for splitting the data across the three subsets.
Each NetCDF contains the same variables but split into different subsets (train, val and test) of the based on time dimension.
Otherwise the NetCDF files have the sames dimensions and coordinates for ensemble_member, grid_longitude and grid_latitude.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of new MSK dataset vs. incomplete observed dataset (for more details on the variables, please consult S2 Table).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The variables contained in the data sets are primarily concerned with perinatal outcomes and maternal health. A number of variables with respect to the social and economic status of the mothers and their families were also included (ie. Occupation, Marital status, Region). While all nine data sets are centered around these common themes and hold many variables in common, each data set has a unique combination of variables. The types of fields are wide-ranging but are primarily concerned with infant birth, maternal health, and socioeconomic status. The Dublin patients are a random sample of those found in the clinical records of the hospital. Case files were compiled from two sources, the Register of Patients, which included the administrative record of each patient, and the Master’s Ward Book, which noted the medical circumstances of each case. These records exist in continuous series during the years with which this study is concerned, and only minor changes occurred in the categories of information collected. Most of these documents were held by the Rotunda Hospital when they were consulted for this project, but all of them have now been transferred to the Public Record Office of Ireland in Dublin. As birth weights were first recorded in July 1869, 100 cases were selected for that year. In all subsequent years 200 cases were chosen. The preliminary data base consisted of 12,454 cases. The weight and length means in the sample are accurate to 84 grams and 0.4 centimeter at a confidence level of 95 percent
'amip4xco2' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
6.5 amip4xco2 (6.5 4xCO2 AMIP) - Version 1: Identical to expt. 6.2b, but with AMIP SSTs prescribed as in expt. 3.3 (which is the control for this run).
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
"esmrcp85" is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
5.3 esmrcp85 (5.3 ESM RCP8.5): Future projection (2006-2100) forced by RCP8.5. As in experiment 4.2_RCP8.5 but emissions-forced (with atmospheric CO2 determined by the model itself).
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
These are the number of fish species and environmental data used to estimate numbers of fish species in the Western Indian Ocean. The results are published in: Modeling the spatial distribution of numbers of coral reef fish species and community types in the Western Indian Ocean faunal province - https://doi.org/10.3354/meps14538. ABSTRACT: Predicting and mapping coral reef diversity at moderate scales can assist spatial planning and prioritizing conservation activities. We made coarse-scale (6.25 km2) predictive models for numbers of coral reef fish species and community composition starting with a spatially complete database of 70 environmental variables available for 7039 mapped reef cells in the Western Indian Ocean. An ensemble model was created from a process of variable elimination and selectivity to make the best predictions irrespective of human influences. This best model was compared to models using preselected variables commonly used to evaluate climate change and human fishing and water quality influences. Many variables (~27) contributed to the best number of species and community composition models, but local variables of biomass, depth, and retention connectivity were dominant predictors. The key human-influenced variables included fish biomass and distance to human populations, with weaker associations with sediments and nutrients. Climate-influenced variables were generally weaker and included median sea surface temperature (SST) with contributions in declining order from SST kurtosis, bimodality, excess summer heat, SST skewness, SST rate of rise, and coral cover. Community composition variability was best explained by 2 dominant community richness axes of damselfishes–angelfishes and butterflyfishes–parrotfishes. Numbers of damselfish–angelfish species were ecologically separated by depth, and damselfishes declined with increasing depth, median temperature, cumulative excess heat, rate of temperature rise, and chronic temperature stresses. Species of butterflyfish–parrotfish separated by median temperature, and butterflyfish numbers declined with increasing temperature, chronic and acute temperature variability, and the rate of temperature rise. Several fish diversity hotspots were found in the East African Coastal Current Ecoregion centered in Tanzania, followed by Mayotte, southern Kenya, and northern Mozambique. If biomass can be maintained, the broad distributions of species combined with compensatory community responses should maintain high diversity and ecological resilience to climate change and other human stressors.
[ Derived from parent entry - See data hierarchy tab ]
'amip4xco2' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
6.5 amip4xco2 (6.5 4xCO2 AMIP) - Version 1: Identical to expt. 6.2b, but with AMIP SSTs prescribed as in expt. 3.3 (which is the control for this run).
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
"historical" is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
3.2 historical (3.2 Historical): Simulation of recent past (1850 to 2005). Impose changing conditions (consistent with observations).
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
[ Derived from parent entry - See data hierarchy tab ]
'amip' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
3.3 amip (3.3 AMIP) - Version 1: AMIP (1979 - at least 2008). Impose SSTs and sea ice from observations but with other conditions as in experiment 3.2 historical.
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
‘historical’ is an experiment of the CMIP5 — Coupled Model Intercomparison Project Phase 5
(https://pcmdi.llnl.gov/mips/cmip5).CMIP5 is meant to provide a framework for coordinated
climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
3.2 historical (3.2 Historical) — Version 1: Simulation of recent past (1850 to 2005). Impose changing conditions (consistent with observations).
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html
List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html
Output: time series per variable in model grid spatial resolution in netCDF format
Earth System Model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax
https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf
as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble
Member/version number/variable name/CMOR filename.nc.
[ Derived from parent entry - See data hierarchy tab ]
aqua4xco2 is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 ( https://pcmdi.llnl.gov/mips/cmip5 ). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
aqua4xco2 (6.7b 4xCO2 aqua planet) - Version 2: Consistent with CFMIP requirements, impose a 4xCO2 on zonally uniform SSTs of expt. 6.7a (which is the control for this run).
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax ( https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf ) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc .
[ Derived from parent entry - See data hierarchy tab ]
'amip' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
3.3 amip (3.3 AMIP) - Version 1: AMIP (1979 - at least 2008). Impose SSTs and sea ice from observations but with other conditions as in experiment 3.2 historical.
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
[ Derived from parent entry - See data hierarchy tab ]
'rcp45' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
4.1 rcp45 (4.1 RCP4.5) - Version 1: Future projection (2006-2100) forced by RCP4.5. RCP4.5 is a representative concentration pathway which approximately results in a radiative forcing of 4.5 W m-2 at year 2100, relative to pre-industrial conditions. RCPs are time-dependent, consistent projections of emissions and concentrations of radiatively active gases and particles.
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
'abrupt4xco2' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
6.3 abrupt4xco2 (6.3 Abrupt 4XCO2) - Version 1: Impose an instantaneous quadrupling of CO2, then hold fixed.
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
[ Derived from parent entry - See data hierarchy tab ]
'amip4xco2' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
6.5 amip4xco2 (6.5 4xCO2 AMIP) - Version 1: Identical to expt. 6.2b, but with AMIP SSTs prescribed as in expt. 3.3 (which is the control for this run).
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
'abrupt4xco2' is an experiment of the CMIP5 - Coupled Model Intercomparison Project Phase 5 (https://pcmdi.llnl.gov/mips/cmip5). CMIP5 is meant to provide a framework for coordinated climate change experiments for the next five years and thus includes simulations for assessment in the AR5 as well as others that extend beyond the AR5.
6.3 abrupt4xco2 (6.3 Abrupt 4XCO2) - Version 1: Impose an instantaneous quadrupling of CO2, then hold fixed.
Experiment design: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html List of output variables: https://pcmdi.llnl.gov/mips/cmip5/datadescription.html Output: time series per variable in model grid spatial resolution in netCDF format Earth System model and the simulation information: CIM repository
Entry name/title of data are specified according to the Data Reference Syntax (https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf) as activity/product/institute/model/experiment/frequency/modeling realm/MIP table/ensemble member/version number/variable name/CMOR filename.nc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data in social and behavioral sciences are routinely collected using questionnaires, and each domain of interest is tapped by multiple indicators. Structural equation modeling (SEM) is one of the most widely used methods to analyze such data. However, conventional methods for SEM face difficulty when the number of variables (p) is large even when the sample size (N) is also rather large. This article addresses the issue of model inference with the likelihood ratio statistic Tml. Using the method of empirical modeling, mean-and-variance corrected statistics for SEM with many variables are developed. Results show that the new statistics not only perform much better than Tml but also are substantial improvements over other corrections to Tml. When combined with a robust transformation, the new statistics also perform well with non-normally distributed data.