Replication Data for the manuscript "Does peer review improve the statistical content of manuscripts? A study on 31,794 manuscripts submitted to four journals" Includes the dataset for replicating the study and the generated LIWC style dictionary
FSRDC allows qualified researchers to securely use restricted-access data from the U.S. Census Bureau, the National Center for Health Statistics (NCHS), the Agency for Healthcare Research and Quality (AHRQ), and the Bureau of Labor Statistics. These data are extraordinarily rich and virtually the only source for many important questions in health and social sciences. The Stanford Federal Statistical Research Data Center (FSRDC) allows qualified researchers to securely use restricted-access data from the U.S. Census Bureau, the National Center for Health Statistics (NCHS), the Agency for Healthcare Research and Quality (AHRQ), and the Bureau of Labor Statistics. For example, researchers can access detailed geographic indicators that are not publicly available in data such as the National Health Interview Survey (NHIS) and National Health and Nutrition Examination Survey (NHANES).
PHS does not host FSRDC data. If you wish to use FSRDC data for a health related project, please reach out to the Stanford FSRDC: https://iriss.stanford.edu/fsrdc
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Metadata access is required to view this section.
CSV Data set. The Data Dictionary (Part 1) and Statistical Code (Part 2) are in ServCat Reference....
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The first sheet of the file contains solving times of each of the 3 spatial tasks on 6 stimuli variants of every study participant (identified by participant code). Times in red mean that the task was not solved within the time limit. Cells filled with green indicate that the particular stimuli variant was shown first to the participant.The second sheet contains statistical information from the questionnaire. Cells filled with gray indicate participants with adventitous blindness. Date Submitted: 2022-02-16
These are the data and statistical code needed to reproduce the analysis in the manuscript describing validation of RhizoVision Explorer, and open-source software for image analysis of plant roots. This code validates measurements such as length, average diameter, surface area, and volume against ground truth data from a copper wire image set, and compares these measurements against other image analysis software for a simulated image set and with real root scans from various plant species. Please cite this repository if the code is used for your work. Citation of the manuscript: Seethepalli, A., Dhakal, K., Griffiths, M., Guo, H., Freschet, G. T., York, L. M. (2021). RhizoVision Explorer: Open-source software for root image analysis and measurement standardization. AoB PLANTS, plab056, https://doi.org/10.1093/aobpla/plab056 Instructions: Download and extract the ZIP file. The .R file is a text file containing the the code for R. All required packages are described in the script, and the code is designed to use the RStudio API to automatically select the working directory as the one from which the script is contained in. The needed data is in the 'Input files' directory and will automatically be loaded using relative paths. As long as the needed packages are installed, the code should run and produce all graphical output and statistics.
The first sheet of the file contains solving times of each of the 3 spatial tasks on 6 stimuli variants of every study participant (identified by participant code). Times in red mean that the task was not solved within the time limit. Cells filled with green indicate that the particular stimuli variant was shown first to the participant.The second sheet contains statistical information from the questionnaire. Cells filled with gray indicate participants with adventitous blindness. Date Submitted: 2022-02-16
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datafiles and JASP files for the project "AffixStat" - investigating the origin of a typological suffixing bias in world languages
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global journal review service market is experiencing robust growth, driven by the increasing volume of research publications and the rising demand for high-quality peer review to ensure scientific rigor. The market's expansion is fueled by several key trends, including the growing adoption of online submission and review platforms, the increasing specialization within scientific disciplines demanding expert review, and the rising pressure on researchers to publish in high-impact journals. While the precise market size in 2025 requires further specification, a reasonable estimate, considering typical growth rates in related sectors, places it at around $2 billion. Considering a conservative Compound Annual Growth Rate (CAGR) of 8% over the forecast period (2025-2033), the market is projected to reach approximately $4 billion by 2033. This growth, however, is not without its challenges. Constraints include the rising cost of peer review, concerns about bias and conflicts of interest within the review process, and the ongoing debate about the efficiency and effectiveness of the current peer-review system. The market is segmented by service type (e.g., editorial services, language editing, statistical analysis, manuscript formatting), target audience (researchers, publishers), and geographic region. Key players in the market, including Editorpages, Genex Services, and Research Square, are constantly innovating to address these challenges and improve the efficiency and transparency of the journal review process. The competitive landscape is characterized by a mix of large, established companies and smaller, specialized service providers. Larger companies leverage their established networks and diverse service offerings, while smaller players often focus on niche expertise or specific regions. Future market success will depend on the ability to offer innovative solutions that streamline the review process, enhance transparency, and address the growing concerns around fairness and efficiency. Furthermore, investment in advanced technologies, such as artificial intelligence (AI) for manuscript screening and plagiarism detection, is likely to play a significant role in shaping the future of the journal review service market. Continued growth will also hinge on addressing concerns regarding reviewer compensation and workload to maintain a sustainable and high-quality peer review ecosystem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAlthough peer review is widely considered to be the most credible way of selecting manuscripts and improving the quality of accepted papers in scientific journals, there is little evidence to support its use. Our aim was to estimate the effects on manuscript quality of either adding a statistical peer reviewer or suggesting the use of checklists such as CONSORT or STARD to clinical reviewers or both.Methodology and Principal FindingsInterventions were defined as 1) the addition of a statistical reviewer to the clinical peer review process, and 2) suggesting reporting guidelines to reviewers; with “no statistical expert” and “no checklist” as controls. The two interventions were crossed in a 2×2 balanced factorial design including original research articles consecutively selected, between May 2004 and March 2005, by the Medicina Clinica (Barc) editorial committee. We randomized manuscripts to minimize differences in terms of baseline quality and type of study (intervention, longitudinal, cross-sectional, others). Sample-size calculations indicated that 100 papers provide an 80% power to test a 55% standardized difference. We specified the main outcome as the increment in quality of papers as measured on the Goodman Scale. Two blinded evaluators rated the quality of manuscripts at initial submission and final post peer review version. Of the 327 manuscripts submitted to the journal, 131 were accepted for further review, and 129 were randomized. Of those, 14 that were lost to follow-up showed no differences in initial quality to the followed-up papers. Hence, 115 were included in the main analysis, with 16 rejected for publication after peer review. 21 (18.3%) of the 115 included papers were interventions, 46 (40.0%) were longitudinal designs, 28 (24.3%) cross-sectional and 20 (17.4%) others. The 16 (13.9%) rejected papers had a significantly lower initial score on the overall Goodman scale than accepted papers (difference 15.0, 95% CI: 4.6–24.4). The effect of suggesting a guideline to the reviewers had no effect on change in overall quality as measured by the Goodman scale (0.9, 95% CI: −0.3–+2.1). The estimated effect of adding a statistical reviewer was 5.5 (95% CI: 4.3–6.7), showing a significant improvement in quality.Conclusions and SignificanceThis prospective randomized study shows the positive effect of adding a statistical reviewer to the field-expert peers in improving manuscript quality. We did not find a statistically significant positive effect by suggesting reviewers use reporting guidelines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the protein, cilnical and statistical datasets used in the manuscript entitled ' Plasma proteome profiles associated with early development of lung injury in extremely preterm infants'.
Manuscript authors; Pereira-Fantini, Byars, Kamlin, Manley, Davis and Tingay.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These are the 28 files that are used for the analysis in the submitted manuscript entitled: An Error in a Recent Paper: gazeNet: End-to-end eye-movement event detection with deep neural Networks (Zemblys, Niehorster, & Holmqvist, 2019)
NA. This dataset is not publicly accessible because: The data used in this manuscript were obtained under Data Use Agreements with the NCS Vanguard Data and Sample Archive and Access System and the NICHD Data and Specimen Hub (DASH). Because of the requirements of the DUA, we are unable to provide raw data; thus, the summary data are provided that are included in the manuscript. It can be accessed through the following means: The manuscript contains tables of the summary statistics. For the original data, users must have an approved DUA with NICHD DASH. Format: Word file of tables with summary statistics for maternal blood Pb, urine Pb, Pb surface wipe loading and Pb vacuum bag dust. This dataset is associated with the following publication: Stanek, L., N. Grokhowsky, B. George, and K. Thomas. Assessing lead exposure in U.S. pregnant women using biological and residential measurements. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, (905): 167135, (2023).
These files contain the GWAS summary statistics as described in van der Zee, M.D., et al. (2021, manuscript submitted). If summary statistics are used, please make sure to include a reference to the original manuscript.
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Statistics illustrates consumption, production, prices, and trade of Paper Binders, Folders and File Covers in Jamaica from 2007 to 2024.
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Statistics illustrates consumption, production, prices, and trade of Paper Binders, Folders and File Covers in Democratic People's Republic of Korea from 2007 to 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package for A Study on the Pythonic Functional Constructs' Understandability
This package contains several folders and files with code and data used in the study.
examples/
Contains the code snippets used as objects of the study, named as reported in Table 1, summarizing the experiment design.
RQ1-RQ2-files-for-statistical-analysis/
Contains three .csv files used as input for conducting the statistical analysis and drawing the graphs for addressing the first two research questions of the study. Specifically:
- ConstructUsage.csv contains the declared frequency usage of the three functional constructs object of the study. This file is used to draw Figure 4.
- RQ1.csv contains the collected data used for the mixed-effect logistic regression relating the use of functional constructs with the correctness of the change task, and the logistic regression relating the use of map/reduce/filter functions with the correctness of the change task.
- RQ1Paired-RQ2.csv contains the collected data used for the ordinal logistic regression of the relationship between the perceived ease of understanding of the functional constructs and (i) participants' usage frequency, and (ii) constructs' complexity (except for map/reduce/filter).
inter-rater-RQ3-files/
Contains four .csv files used as input for computing the inter-rater agreement for the manual labeling used for addressing RQ3. Specifically, you will find one file for each functional construct, i.e., comprehension.csv, lambda.csv, and mrf.csv, and a different file used for highlighting the reasons why participants prefer to use the procedural paradigm, i.e., procedural.csv.
Questionnaire-Example.pdf
This file contains the questionnaire submitted to one of the ten experimental groups within our controlled experiment. Other questionnaires are similar, except for the code snippets used for the first section, i.e., change tasks, and the second section, i.e., comparison tasks.
RQ2ManualValidation.csv
This file contains the results of the manual validation being done to sanitize the answers provided by our participants used for addressing RQ2. Specifically, we coded the behavior description using four different levels: (i) correct, (ii) somewhat correct, (iii) wrong, and (iv) automatically generated.
RQ3ManualValidation.xlsx
This file contains the results of the open coding applied to address our third research question. Specifically, you will find four sheets, one for each functional construct and one for the procedural paradigm. For each sheet, you will find the provided answers together with the categories assigned to them.
Appendix.pdf
This file contains the results of the logistic regression relating the use of map, filter, and reduce functions with the correctness of the change task, not shown in the paper.
FuncConstructs-Statistics.r
This file contains an R script that you can reuse to re-run all the analyses conducted and discussed in the paper.
FuncConstructs-Statistics.ipynb
This file contains the code to re-execute all the analysis conducted in the paper as a notebook.
All data contained in this dataset has been used and analyzed in detail in the manuscript "The sensitivity of convective cloud ensemble statistics to horizontal resolution in idealized RCE simulations" (J. Savre and G. Craig) submitted to the Journal of the Atmospheric Sciences in August 2022. The present README file describes briefly the files (naming convention, structure and content) gathered in the folder. The dataset has been produced using the MISU-MIT Cloud and Aerosol (MIMICA) model, a limited area, high-resolution cloud resolving model dedicated to atmospheric applications. The model was configured to run a case of Radiative Convective Equilibrium (RCE) in a 128x128 km2 periodic domain over an ocean surface at fixed temperature (300 K). 20 numerical experiments were designed based on this initial setup in which both the model's horizontal resolution and the fixed cooling rate were simultaneously varied. The data is thus collected in 20 different folders: - Parent directories RCE2, RCE4, RCE8, RCE12 gather data produced at cooling rates of -2 K/d, -4 K/d, -8 K/d and -12 K/d, respectively - Sub-directories 2km, 1km, 500m, 250m, 125m gather data generated at the corresponding cooling rates and horizontal resolutions of 2 km, 1 km, 500 m, 250 m and 125 m.
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations th...
Data and statistical analysis scripts for manuscript on wheat root response to nitrate using X-ray CT and OpenSimRoot X-ray CT reveals 4D root system development and lateral root responses to nitrate in soil - [https://doi.org/10.1002/ppj2.20036] The ZIP file contains: MCT1_Rcode.R - Statistics script for candidate single-timepoint experiment. Requires all CSV data files in the directory. User needs to set working directory to location of this script and the CSV data files before running. MCT1... .csv - 3 CSV data files required by the R script. MCT2_Rcode.R - Statistics script for time-series experiment. Requires all CSV data files in the directory. User needs to set working directory to location of this script and the CSV data files before running. MCT2... .csv - 3 CSV data files required by the R script. R_RooThProcessing.R - R code for aggregating root traits from RooTh software. Modelling folder - OpenSimRoot with model parameters and root data used in manuscript.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset and accompanying R code provided in the R Markdown file linked to the following manuscript submitted for review: Altwegg et al. "Emerging topics and new directions in statistical ecology".Abstract of linked material:Ecological science relies on robust estimates of the abundance, diversity, and spatial distribution of individuals and species, but these quantities are notoriously difficult to observe directly. Data collected on these quantities not only reflect the ecological processes giving rise to them but also the observation process, which is often biased by factors such as uneven sampling effort or imperfect detection. Furthermore, collecting data according to standard sampling designs is often not possible. Statistical ecology as a research field specialises in developing statistical methods for analysing such complex ecological data. Here, we apply text analysis tools to the abstracts submitted to eight International Statistical Ecology Conferences between 2008 and 2022 to guide a review of recent topics in statistical ecology. Results show that estimating various aspects of demography (including survival, recruitment, abundance, density and movement) and spatial distribution remains a key area of research. The field has benefited from and embraced new data collection methods such as automated recorders and rapidly developing remote sensing techniques. How to integrate data from different sources is a central challenge that spans multiple areas of statistical ecology. The statistical ecology community strives to be inclusive. It also promotes robust data analysis strategies that underpin reproducible research and transparent conservation decisions. With the increasing pressure of human society on nature, we feel statistical ecology is becoming an ever more important research field. Files:Data_Altwegg_et_al_JSTP_2025.csvData_Altwegg_et_al_JSTP_2025.rmd
Replication Data for the manuscript "Does peer review improve the statistical content of manuscripts? A study on 31,794 manuscripts submitted to four journals" Includes the dataset for replicating the study and the generated LIWC style dictionary