While classical measurement error in the dependent variable in a linear regression framework results only in a loss of precision, nonclassical measurement error can lead to estimates which are biased and inference which lacks power. Here, we consider a particular type of nonclassical measurement error: skewed errors. Unfortunately, skewed measurement error is likely to be a relatively common feature of many out- comes of interest in political science research. This study highlights the bias that can result even from relatively "small" amounts of skewed measurement error, particularly if the measurement error is heteroskedastic. We also assess potential solutions to this problem, focusing on the stochastic frontier model and nonlinear least squares. Simulations and three replications highlight the importance of thinking carefully about skewed measurement error, as well as appropriate solutions.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
"NewEngland_pkflows.PRT" is a text file that contains results of flood-frequency analysis of annual peak flows from 186 selected streamflow gaging stations (streamgages) operated by the U.S. Geological Survey (USGS) in the New England region (Maine, Connecticut, Massachusetts, Rhode Island, New York, New Hampshire, and Vermont). Only streamgages in the region that were also in the USGS "GAGES II" database (https://water.usgs.gov/GIS/metadata/usgswrd/XML/gagesII_Sept2011.xml) were considered for use in the study. The file was generated by combining PeakFQ output (.PRT) files created using version 7.0 of USGS software PeakFQ (https://water.usgs.gov/software/PeakFQ/; Veilleux and others, 2014) to conduct flood-frequency analyses using the Expected Moments Algorithm (England and others, 2018). The peak-flow files used as input to PeakFQ were obtained from the USGS National Water Information System (NWIS) database (https://nwis.waterdata.usgs.gov/usa/nwis/peak) and contained annual ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Material 3: A supplementary file with examples of SAS script for all models that have been fitted in this paper.
Observed phenotypic responses to selection in the wild often differ from predictions based on measurements of selection and genetic variance. An overlooked hypothesis to explain this paradox of stasis is that a skewed phenotypic distribution affects natural selection and evolution. We show through mathematical modelling that, when a trait selected for an optimum phenotype has a skewed distribution, directional selection is detected even at evolutionary equilibrium, where it causes no change in the mean phenotype. When environmental effects are skewed, Lande and Arnold’s (1983) directional gradient is in the direction opposite to the skew. In contrast, skewed breeding values can displace the mean phenotype from the optimum, causing directional selection in the direction of the skew. These effects can be partitioned out using alternative selection estimates based on average derivatives of individual relative fitness, or additive genetic covariances between relative fitness and trait (Robe...
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
This dataset is part of a series of datasets, where batteries are continuously cycled with randomly generated current profiles. Reference charging and discharging cycles are also performed after a fixed interval of randomized usage to provide reference benchmarks for battery state of health.
In this dataset, four 18650 Li-ion batteries (Identified as RW25, RW26, RW27 and RW28) were continuously operated by repeatedly charging them to 4.2V and then discharging them to 3.2V using a randomized sequence of discharging currents between 0.5A and 5A. This type of discharging profile is referred to here as random walk (RW) discharging. A customized probability distribution is used in this experiment to select a new load setpoint every 1 minute during RW discharging operation. The custom probability distribution was designed to be skewed towards selecting higher currents. The ambient temperature at which the batteries are cycled was held at approximately 40C for these experiments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We generate 10 model trees for a given number of genomes (). The number of false positives (FP), the number of false negatives (FN), and the execution time (time) in a cell are the average of the finished computations (finished: the number of finished computations within 24 hours) out of 10 trials using 10 different model trees. , , and in the tables are hours, minutes, and seconds, respectively. is the number of genes in a genome, which is 100 in our experiments.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper uses extreme value theory to study the implications of skewness risk for nominal loan contracts in a production economy. Productivity and inflation innovations are drawn from generalized extreme value distributions. The model is solved using a third-order perturbation and estimated by the simulated method of moments. Results show that the data reject the hypothesis that innovations are drawn from normal distributions and favor instead the alternative that they are drawn from asymmetric distributions. Estimates indicate that skewness risk accounts for 12% of the risk premia and reduces bond yields by approximately 55 basis points. For a bond that pays 1 dollar at maturity, the adjustment factor associated with skewness risk ranges from 0.15 cents for a 3?month bond to 2.05 cents for a 5?year bond.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
GC skew denotes the relative excess of G nucleotides over C nucleotides on the leading versus the lagging replication strand of eubacteria. While the effect is small, typically around 2.5%, it is robust and pervasive. GC skew and the analogous TA skew are a localized deviation from Chargaff's second parity rule, which states that G and C, and T and A occur with (mostly) equal frequency even within a strand.
Most bacteria also show the analogous TA skew. Different phyla show different kinds of skew and differing relations between TA and GC skew.
This article introduces an open access database (https://skewdb.org) of GC and 10 other skews for over 28,000 chromosomes and plasmids. Further details like codon bias, strand bias, strand lengths and taxonomic data are also included.
The SkewDB database can be used to generate or verify hypotheses. Since the origins of both the second parity rule, as well as GC skew itself, are not yet satisfactorily explained, such a database may enhance our understanding of microbial DNA.
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
The dataset contains all the data produced running the research software for the study:"Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta".
Disclaimer: these results are not considered to be representative, because we have fount that Mega Journals skewed significantly some of the data. The result datasets without Mega Journals are published here.
Description of datasets:
SSH_Publications_in_OC_Meta_and_Open_Access_status.csv: containing information about OpenCitations Meta coverage of ERIH PLUS Journals as well as their Open Access availability. In this dataset, every row holds data for a Journal of ERIH PLUS also covered by OpenCitations Meta database. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "OC_omid", the internal OpenCitations Meta identifier for the venue; "issn", numbers of publications in each venue; "Open Access", a value to represent if the journal is OA or not, either "True" or "Unknown".
SSH_Publications_by_Discipline.csv: containing information about number of publications per discipline (in addition, number of journals per discipline are also included). The dataset has three columns, the first, labeled "Discipline", contains single disciplines of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
SSH_Publications_and_Journals_by_Country: containing information about number of publications and journals per country. The dataset has three columns, the first, labeled "Country", contains single countries of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
result_disciplines.json: the dictionary containing all disciplines as key and a list of related ERIH PLUS venue identifiers as value.
result_countries.json: the dictionary containing all countries as key and a list of related ERIH PLUS venue identifiers as value.
duplicate_omids.csv: a dataset containing the duplicated Journal entries in OpenCitations Meta, structured with two columns: "OC_omid", the internal OC Meta identifier; "issn", the issn values associated to that identifier
eu_data.csv: contains the data specific for European countries' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "Original_Title", "Country_of_Publication","ERIH_PLUS_Disciplines", "disc_count", the number of disciplines per Journal.
eu_disciplines_count.csv: containing information about number of publications per discipline and number of journals per discipline of european countries. The dataset has three columns, the first, labeled "Discipline", contains single disciplines of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
meta_coverage_eu.csv: contains the data specific for European countries' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "OC_omid", the internal OpenCitations Meta identifier for the venue; "issn", numbers of publications in each venue; "Open Access", a value to represent if the journal is OA or not, either "True" or "Unknown".
us_data.csv: contains the data specific for the United States' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "Original_Title", "Country_of_Publication","ERIH_PLUS_Disciplines", "disc_count", the number of disciplines per Journal.
us_disciplines_count.csv: containing information about number of publications per discipline and number of journals per discipline of the United States. The dataset has three columns, the first, labeled "Discipline", contains single disciplines of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
meta_coverage_us.csv: contains the data specific for the United States' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "OC_omid", the internal OpenCitations Meta identifier for the venue; "issn", numbers of publications in each venue; "Open Access", a value to represent if the journal is OA or not, either "True" or "Unknown".
Abstract of the research:
Purpose: this study aims to investigate the representation and distribution of Social Science and Humanities (SSH) journals within the OpenCitations Meta database, with a particular emphasis on their Open Access (OA) status, as well as their spread across different disciplines and countries. The underlying premise is that open infrastructures play a pivotal role in promoting transparency, reproducibility, and trust in scientific research. Study Design and Methodology: the study is grounded on the premise that open infrastructures are crucial for ensuring transparency, reproducibility, and fostering trust in scientific research. The research methodology involved the use of secondary data sources, namely the OpenCitations Meta database, the ERIH PLUS bibliographic index, and the DOAJ index. A custom research software was developed in Python to facilitate the processing and analysis of the data. Findings: the results reveal that 78.1% of SSH journals listed in the European Reference Index for the Humanities (ERIH-PLUS) are included in the OpenCitations Meta database. The discipline of Psychology has the highest number of publications. The United States and the United Kingdom are the leading contributors in terms of the number of publications. However, the study also uncovers that only 38% of the SSH journals in the OpenCitations Meta database are OA. Originality: this research adds to the existing body of knowledge by providing insights into the representation of SSH in open bibliographic databases and the role of open access in this domain. The study highlights the necessity for advocating OA practices within SSH and the significance of open data for bibliometric studies. It further encourages additional research into the impact of OA on various facets of citation patterns and the factors leading to disparity across disciplinary representation.
Related resources:
Ghasempouri S., Ghiotto M., & Giacomini S. (2023). Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta - RESEARCH ARTICLE. https://doi.org/10.5281/zenodo.8263908
Ghasempouri, S., Ghiotto, M., Giacomini, S., (2023). Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta - DATA MANAGEMENT PLAN (Version 4). Zenodo. https://doi.org/10.5281/zenodo.8174644
Ghasempouri, S., Ghiotto, M., Giacomini, S. (2023e). Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta - PROTOCOL. V.5. (https://dx.doi.org/10.17504/protocols.io.5jyl8jo1rg2w/v5)
Skewed adult sex ratios sometimes occur in populations of free-living animals yet the proximate mechanisms, timing of sex-biases, and the selective agents contributing to skew remain a source of debate with contradictory evidence from different systems. We investigated potential mechanisms contributing to sex biases in a population of herring gulls with an apparent female skew in the adult population. Theory predicts that skewed adult sex ratios will adaptively lead to skewed offspring sex ratios to restore balance in the effective breeding population. Parents may also adaptively bias offspring sex ratios to increase their own fitness in response to environmental factors. Therefore, we expected to detect skewed sex ratios either at hatching or at fledging as parents invest differentially in offspring of different sexes. We sampled complete clutches (n = 336 chicks) at hatching to quantify potential skews in sex ratios by position in the hatch order, time of season, year, and nesting con...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The increased focus on addressing severe maternal morbidity and maternal mortality has led to studies investigating patient and hospital characteristics associated with longer hospital stays. Length of stay (LOS) for delivery hospitalizations has a strongly skewed distribution with the vast majority of LOS lasting two to three days in the United States. Prior studies typically focused on common LOSs and dealt with the long LOS distribution tail in ways to fit conventional statistical analyses (e.g., log transformation, trimming). This study demonstrates the use of Gamma mixture models to analyze the skewed LOS distribution. Gamma mixture models are flexible and, do not require data transformation or removal of outliers to accommodate many outcome distribution shapes, these models allow for the analysis of patients staying in the hospital for a longer time, which often includes those women experiencing worse outcomes. Random effects are included in the model to account for patients being treated within the same hospitals. Further, the role and influence of differing placements of covariates on the results is discussed in the context of distinct model specifications of the Gamma mixture regression model. The application of these models shows that they are robust to the placement of covariates and random effects. Using New York State data, the models showed that longer LOS for childbirth hospitalizations were more common in hospitals designated to accept more complicated deliveries, across hospital types, and among Black women. Primary insurance also was associated with LOS. Substantial variation between hospitals suggests the need to investigate protocols to standardize evidence-based medical care.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
One common strategy to test the positivity effect in emotional information processing is to examine memory for positive information. Participants with better recall or recognition of positive (compared to negative or neutral) information may have attended more closely to positive information during the task, leading to better encoding and retrieval. Memory probes are periodic trials in which participants are asked to recognize information from a previous trial and can be used to examine which specific information is attended to in a decision-making paradigm. Participants will complete a variant of the skewed gambling task to explore the role of memory for positive information in decision making. Interspersed among the 90 gambles will be 24 recognition probe trials. Participants will be asked in recognition memory probe trials if the amount (or probability) of a gain (or a loss) was part of the gamble on the previous trial. Probe trials will be distributed equally after positively- and negatively-skewed gambles and half of the probes will be correct and half will be incorrect. Thus, memory for each type of information (gain amount, loss amount, gain probability, or loss probability) will be probed 6 times.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Conditional heteroskedasticity, skewness and leverage effects are well-known features of financial returns. The literature on factor models has often made assumptions that preclude the three effects to occur simultaneously. In this paper I propose a conditionally heteroskedastic factor model that takes into account the presence of both the conditional skewness and leverage effects. This model is specified in terms of conditional moment restrictions and unconditional moment conditions are proposed allowing inference by the generalized method of moments (GMM). The model is also shown to be closed under temporal aggregation. An application to daily excess returns on sectorial indices from the UK stock market provides strong evidence for dynamic conditional skewness and leverage with a sharp efficiency gain resulting from accounting for both effects. The estimated volatilitypersistence from the proposed model is lower than that estimated from models that rule out such effects. I also find that the longer the returns' horizon, the fewer conditionally heteroskedastic factors may be required for suitable modeling and the less strong is the evidence for dynamic leverage. Some of these results are in line with the main findings of Harvey and Siddique (1999) and Jondeau and Rockinger (2003), namely that accounting for conditional skewness impacts the persistence in the conditional variance of the return process.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset contains site information, basin characteristics, results of flood-frequency analysis, and a generalized (regional) flood skew for 76 selected streamgages operated by the U.S. Geological Survey (USGS) in the upper White River basin (4-digit hydrologic unit 1101) in southern Missouri and northern Arkansas. The Little Rock District U.S. Army Corps of Engineers (USACE) needed updated estimates of streamflows corresponding to selected annual exceedance probabilities (AEPs) and a basin-specific regional flood skew. USGS selected 111 candidate streamgages in the study area that had 20 or more years of gaged annual peak-flow data available through the 2020 water year. After screening for regulation, urbanization, redundant/nested basins, drainage areas greater than 2,500 square miles, and streamgage basins located in the Mississippi Alluvial Plain (8-digit hydrologic unit 11010013), 77 candidate streamgages remained. After conducting the initial flood-frequency analysis ...
https://www.immport.org/agreementhttps://www.immport.org/agreement
Objectives: Convenience sampling is an imperfect but important tool for seroprevalence studies. For COVID-19, local geographic variation in cases or vaccination can confound studies that rely on the geographically skewed recruitment inherent to convenience sampling. The objectives of this study were: (1) quantifying how geographically skewed recruitment influences SARS-CoV-2 seroprevalence estimates obtained via convenience sampling and (2) developing new methods that employ Global Positioning System (GPS)-derived foot traffic data to measure and minimise bias and uncertainty due to geographically skewed recruitment. Design: We used data from a local convenience-sampled seroprevalence study to map the geographic distribution of study participants' reported home locations and compared this to the geographic distribution of reported COVID-19 cases across the study catchment area. Using a numerical simulation, we quantified bias and uncertainty in SARS-CoV-2 seroprevalence estimates obtained using different geographically skewed recruitment scenarios. We employed GPS-derived foot traffic data to estimate the geographic distribution of participants for different recruitment locations and used this data to identify recruitment locations that minimise bias and uncertainty in resulting seroprevalence estimates. Results: The geographic distribution of participants in convenience-sampled seroprevalence surveys can be strongly skewed towards individuals living near the study recruitment location. Uncertainty in seroprevalence estimates increased when neighbourhoods with higher disease burden or larger populations were undersampled. Failure to account for undersampling or oversampling across neighbourhoods also resulted in biased seroprevalence estimates. GPS-derived foot traffic data correlated with the geographic distribution of serosurveillance study participants. Conclusions: Local geographic variation in seropositivity is an important concern in SARS-CoV-2 serosurveillance studies that rely on geographically skewed recruitment strategies. Using GPS-derived foot traffic data to select recruitment sites and recording participants' home locations can improve study design and interpretation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The partitioning of reproduction among individuals in communally breeding animals varies greatly among species, from the monopolization of reproduction (high reproductive skew) to similar contribution to the offspring in others (low skew). Reproductive skew models explain how relatedness or ecological constraints affect the magnitude of reproductive skew. They typically assume that individuals are capable of flexibly reacting to social and environmental changes. Most models predict a decrease of skew when benefits of staying in the group are reduced. In the ant Leptothorax acervorum, queens in colonies from marginal habitats form dominance hierarchies and only the top-ranking queen lays eggs ("functional monogyny"). In contrast, queens in colonies from extended coniferous forests throughout the Palaearctic rarely interact aggressively and all lay eggs ("polygyny"). An experimental increase of queen:worker ratios in colonies from low-skew populations elicits queen–queen aggression similar to that in functionally monogynous populations. Here, we show that this manipulation also results in increased reproductive inequalities among queens. Queens from natural overwintering colonies differed in the number of developing oocytes in their ovaries. These differences were greatly augmented in queens from colonies with increased queen:worker ratios relative to colonies with a low queen:worker ratio. As assumed by models of reproductive skew, L. acervorum colonies thus appear to be capable of flexibly adjusting reproductive skew to social conditions, yet in the opposite way than predicted by most models.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
No description was included in this Dataset collected from the OSF
While classical measurement error in the dependent variable in a linear regression framework results only in a loss of precision, nonclassical measurement error can lead to estimates which are biased and inference which lacks power. Here, we consider a particular type of nonclassical measurement error: skewed errors. Unfortunately, skewed measurement error is likely to be a relatively common feature of many out- comes of interest in political science research. This study highlights the bias that can result even from relatively "small" amounts of skewed measurement error, particularly if the measurement error is heteroskedastic. We also assess potential solutions to this problem, focusing on the stochastic frontier model and nonlinear least squares. Simulations and three replications highlight the importance of thinking carefully about skewed measurement error, as well as appropriate solutions.