49 datasets found

n
Data from: Body temperature distributions of active diurnal lizards in three...
data.niaid.nih.gov
datadryad.org
zip
Updated Aug 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raymond B. Huey; Eric R. Pianka (2018). Body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down? [Dataset]. http://doi.org/10.5061/dryad.45g3s
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.45g3s
Dataset updated
Aug 4, 2018
Dataset provided by
University of Washington
The University of Texas at Austin
Authors
Raymond B. Huey; Eric R. Pianka
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Australia, Africa, North America
Description
The performance of ectotherms integrated over time depends in part on the position and shape of the distribution of body temperatures (Tb) experienced during activity. For several complementary reasons, physiological ecologists have long expected that Tb distributions during activity should have a long left tail (left-skewed); but only infrequently have they quantified the magnitude and direction of Tb skewness in nature.

To evaluate whether left-skewed Tb distributions are general for diurnal desert lizards, we compiled and analyzed Tb (∑ = 9,023 temperatures) from our own prior studies of active desert lizards on three continents (25 species in Western Australia, 10 in the Kalahari Desert of Africa, and 10 species in western North America). We gathered these data over several decades, using standardized techniques.

Many species showed significantly left-skewed Tb distributions, even when records were restricted to summer months. However, magnitudes of skewness were always small, such that mean Tb were never more than 1°C lower than median Tb. The significance of Tb skewness was sensitive to sample size, and power tests reinforced this sensitivity.

The magnitude of skewness was not obviously related to phylogeny, desert, body size, or median body temperature. Moreover, formal phylogenetic analysis is inappropriate because geography and phylogeny are confounded (that is, are highly collinear).

Skewness might be limited if lizards pre-warm inside retreats before emerging in the morning, emerge only when operative temperatures are high enough to speed warming to activity Tb, or if cold lizards are especially wary and difficult to spot or catch. Telemetry studies may help evaluate these possibilities.
f
Dataset for: Some Remarks on the R2 for Clustering
wiley.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicola Loperfido; Thaddeus Tarpey (2023). Dataset for: Some Remarks on the R2 for Clustering [Dataset]. http://doi.org/10.6084/m9.figshare.6124508.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6124508.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Nicola Loperfido; Thaddeus Tarpey
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.
Additional file 3 of Modelling count, bounded and skewed continuous outcomes...
springernature.figshare.com
txt
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Akram; Ester Cerin; Karen E. Lamb; Simon R. White (2023). Additional file 3 of Modelling count, bounded and skewed continuous outcomes in physical activity research: beyond linear regression models [Dataset]. http://doi.org/10.6084/m9.figshare.22774300.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22774300.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Muhammad Akram; Ester Cerin; Karen E. Lamb; Simon R. White
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary Material 3: A supplementary file with examples of SAS script for all models that have been fitted in this paper.
m
Impact of limited data availability on the accuracy of project duration...
data.mendeley.com
Updated Nov 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naimeh Sadeghi (2022). Impact of limited data availability on the accuracy of project duration estimation in project networks [Dataset]. http://doi.org/10.17632/bjfdw6xbxw.3
Explore at:
Unique identifier
https://doi.org/10.17632/bjfdw6xbxw.3
Dataset updated
Nov 22, 2022
Authors
Naimeh Sadeghi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database includes simulated data showing the accuracy of estimated probability distributions of project durations when limited data are available for the project activities. The base project networks are taken from PSPLIB. Then, various stochastic project networks are synthesized by changing the variability and skewness of project activity durations. Number of variables: 20 Number of cases/rows: 114240 Variable List: • Experiment ID: The ID of the experiment • Experiment for network: The ID of the experiment for each of the synthesized networks • Network ID: ID of the synthesized network • #Activities: Number of activities in the network, including start and finish activities • Variability: Variance of the activities in the network (this value can be either high, low, medium or rand, where rand shows a random combination of low, high and medium variance in the network activities.) • Skewness: Skewness of the activities in the network (Skewness can be either right, left, None or rand, where rand shows a random combination of right, left, and none skewed in the network activities)
• Fitted distribution type: Distribution type used to fit on sampled data • Sample size: Number of sampled data used for the experiment resembling limited data condition • Benchmark 10th percentile: 10th percentile of project duration in the benchmark stochastic project network • Benchmark 50th percentile: 50th project duration in the benchmark stochastic project network • Benchmark 90th percentile: 90th project duration in the benchmark stochastic project network • Benchmark mean: Mean project duration in the benchmark stochastic project network • Benchmark variance: Variance project duration in the benchmark stochastic project network • Experiment 10th percentile: 10th percentile of project duration distribution for the experiment • Experiment 50th percentile: 50th percentile of project duration distribution for the experiment • Experiment 90th percentile: 90th percentile of project duration distribution for the experiment • Experiment mean: Mean of project duration distribution for the experiment • Experiment variance: Variance of project duration distribution for the experiment • K-S: Kolmogorov–Smirnov test comparing benchmark distribution and project duration • distribution of the experiment • P_value: the P-value based on the distance calculated in the K-S test
n
Data from: Selection on skewed characters and the paradox of stasis
data.niaid.nih.gov
zip
Updated Sep 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin (2017). Selection on skewed characters and the paradox of stasis [Dataset]. http://doi.org/10.5061/dryad.pt07g
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.pt07g
Dataset updated
Sep 8, 2017
Dataset provided by
Centre National de la Recherche Scientifique
Authors
Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Observed phenotypic responses to selection in the wild often differ from predictions based on measurements of selection and genetic variance. An overlooked hypothesis to explain this paradox of stasis is that a skewed phenotypic distribution affects natural selection and evolution. We show through mathematical modelling that, when a trait selected for an optimum phenotype has a skewed distribution, directional selection is detected even at evolutionary equilibrium, where it causes no change in the mean phenotype. When environmental effects are skewed, Lande and Arnold’s (1983) directional gradient is in the direction opposite to the skew. In contrast, skewed breeding values can displace the mean phenotype from the optimum, causing directional selection in the direction of the skew. These effects can be partitioned out using alternative selection estimates based on average derivatives of individual relative fitness, or additive genetic covariances between relative fitness and trait (Robertson-Price identity). We assess the validity of these predictions using simulations of selection estimation under moderate samples size. Ecologically relevant traits may commonly have skewed distributions, as we here exemplify with avian laying date – repeatedly described as more evolutionarily stable than expected –, so this skewness should be accounted for when investigating evolutionary dynamics in the wild.
4
Supplementary data for the paper "Why psychologists should not default to...
data.4tu.nl
zip
Updated Apr 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joost de Winter (2025). Supplementary data for the paper "Why psychologists should not default to Welch’s t-test instead of Student’s t-test (and why the Anderson–Darling test is an underused alternative)" [Dataset]. http://doi.org/10.4121/e8e6861a-7ab0-4b6d-bd67-5f95029322c5.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/e8e6861a-7ab0-4b6d-bd67-5f95029322c5.v3
Dataset updated
Apr 28, 2025
Dataset provided by
4TU.ResearchData
Authors
Joost de Winter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper evaluates the claim that Welch’s t-test (WT) should replace the independent-samples t-test (IT) as the default approach for comparing sample means. Simulations involving unequal and equal variances, skewed distributions, and different sample sizes were performed. For normal distributions, we confirm that the WT maintains the false positive rate close to the nominal level of 0.05 when sample sizes and standard deviations are unequal. However, the WT was found to yield inflated false positive rates under skewed distributions, even with relatively large sample sizes, whereas the IT avoids such inflation. A complementary empirical study based on gender differences in two psychological scales corroborates these findings. Finally, we contend that the null hypothesis of unequal variances together with equal means lacks plausibility, and that empirically, a difference in means typically coincides with differences in variance and skewness. An additional analysis using the Kolmogorov-Smirnov and Anderson-Darling tests demonstrates that examining entire distributions, rather than just their means, can provide a more suitable alternative when facing unequal variances or skewed distributions. Given these results, researchers should remain cautious with software defaults, such as R favoring Welch’s test.
f
Additional file 2 of Modelling count, bounded and skewed continuous outcomes...
springernature.figshare.com
text/x-diff
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Akram; Ester Cerin; Karen E. Lamb; Simon R. White (2023). Additional file 2 of Modelling count, bounded and skewed continuous outcomes in physical activity research: beyond linear regression models [Dataset]. http://doi.org/10.6084/m9.figshare.22774297.v1
Explore at:
text/x-diffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22774297.v1
Dataset updated
Jun 2, 2023
Dataset provided by
figshare
Authors
Muhammad Akram; Ester Cerin; Karen E. Lamb; Simon R. White
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary Material 2: A supplementary file with examples of STATA script for all models that have been fitted in this paper.
Data from: Uneven missing data skew phylogenomic relationships within the...
data.niaid.nih.gov
datadryad.org
zip
Updated Jul 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Smith; William M. Mauck III; Brett W. Benz; Michael J. Andersen (2022). Uneven missing data skew phylogenomic relationships within the lories and lorikeets [Dataset]. http://doi.org/10.5061/dryad.n5tb2rbsp
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.n5tb2rbsp
Dataset updated
Jul 21, 2022
Dataset provided by
New York Genome Centerhttps://www.nygenome.org/
American Museum of Natural History
University of Michigan
University of New Mexico
Authors
Brian Smith; William M. Mauck III; Brett W. Benz; Michael J. Andersen
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Inlcuded is the supplementary data for Smith, B. T., Mauck, W. M., Benz, B., & Andersen, M. J. (2018). Uneven missing data skews phylogenomic relationships within the lories and lorikeets. BioRxiv, 398297. The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9x more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.
Z
CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...
data.niaid.nih.gov
zenodo.org
Updated Jun 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Drew McCornack (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7957401
Explore at:
Dataset updated
Jun 4, 2024
Dataset provided by
Drew McCornack
Lele Cao
Richard Anselmo Stahl
Dhiana Deva Cavacanti Rocha
Vilhelm von Ehrenheim
Armin Catovic
Mark Granroth-Wilding
Description
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

Background and Motivation

In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2

Paper: to be published
Additional file 1 of Modelling count, bounded and skewed continuous outcomes...
springernature.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Akram; Ester Cerin; Karen E. Lamb; Simon R. White (2023). Additional file 1 of Modelling count, bounded and skewed continuous outcomes in physical activity research: beyond linear regression models [Dataset]. http://doi.org/10.6084/m9.figshare.22774294.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22774294.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Muhammad Akram; Ester Cerin; Karen E. Lamb; Simon R. White
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary Material 1: A supplementary file with examples of R script for all models that have been fitted in this paper.
Z
Improved global soil salinity and sodicity mapping through Box-Cox-based...
data.niaid.nih.gov
Updated Oct 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, Tiantian (2024). Improved global soil salinity and sodicity mapping through Box-Cox-based sample transformation and feature optimization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13865594
Explore at:
Dataset updated
Oct 1, 2024
Dataset authored and provided by
Wang, Tiantian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We introduce a novel framework that leverages the Box-Cox transformation to address the skewed distribution of samples and incorporates additional critical predictors to enhance the accuracy of soil salinity (measured as electrical conductivity of the saturated soil extract, ECe) and sodicity (measured as exchangeable sodium percentage, ESP) estimates.

We provide high-resolution (1 km × 1 km) global maps of soil salinity and sodicity from 1980 to 2022 in GeoTIFF file format for each year.

Note: the scale factor for ECe and ESP maps is 0.001
f
Data from: Visualizing Type II Error in Normality Tests
tandf.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José A. Sánchez-Espigares; Pere Grima; Lluís Marco-Almagro (2023). Visualizing Type II Error in Normality Tests [Dataset]. http://doi.org/10.6084/m9.figshare.4568005.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4568005.v1
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
José A. Sánchez-Espigares; Pere Grima; Lluís Marco-Almagro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A skewed exponential power distribution, with parameters defining kurtosis and skewness, is introduced as a way to visualize Type II error in normality tests. By varying these parameters a mosaic of distributions is built, ranging from double exponential to uniform or from positive to negative exponential; the normal distribution is a particular case located in the center of the mosaic. Using a sequential color scheme, a different color is assigned to each distribution in the mosaic depending on the probability of committing a Type II error. This graph gives a visual representation of the power of the performed test. This way of representing results facilitates the comparison of the power of various tests and the influence of sample size. A script to perform this graphical representation, programmed in the R statistical software, is available online as supplementary material.
e
Earnings by Workplace, Borough
data.europa.eu
data.wu.ac.at
unknown
Updated Apr 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2021). Earnings by Workplace, Borough [Dataset]. https://data.europa.eu/data/datasets/earnings-workplace-borough?locale=pl
Explore at:
unknownAvailable download formats
Dataset updated
Apr 30, 2021
Dataset authored and provided by
Office for National Statistics
Description
This dataset provides information about earnings of employees who are working in an area, who are on adult rates and whose pay for the survey pay-period was not affected by absence.

Tables provided here include total gross weekly earnings, and full time weekly earnings with breakdowns by gender, and annual median, mean and lower quartile earnings by borough and UK region. These are provided both in nominal and real terms.

Real earnings figures are on sheets labelled "real", are in 2016 prices, and calculated by applying ONS’s annual CPI index series for April to ASHE data.

Annual Survey of Hours and Earnings (ASHE) is based on a sample of employee jobs taken from HM Revenue & Customs PAYE records. Information on earnings and hours is obtained in confidence from employers. ASHE does not cover the self-employed nor does it cover employees not paid during the reference period.

The earnings information presented relates to gross pay before tax, National Insurance or other deductions, and excludes payments in kind.

The confidence figure is the coefficient of variation (CV) of that estimate. The CV is the ratio of the standard error of an estimate to the estimate itself and is expressed as a percentage. The smaller the coefficient of variation the greater the accuracy of the estimate. The true value is likely to lie within +/- twice the CV.

Results for 2003 and earlier exclude supplementary surveys. In 2006 there were a number of methodological changes made. For further details goto : http://www.nomisweb.co.uk/articles/341.aspx.
The headline statistics for ASHE are based on the median rather than the mean. The median is the value below which 50 per cent of employees fall. It is ONS's preferred measure of average earnings as it is less affected by a relatively small number of very high earners and the skewed distribution of earnings. It therefore gives a better indication of typical pay than the mean.

Survey data from a sample frame, use caution if using for performance measurement and trend analysis
'#' These figures are suppressed as statistically unreliable.
! Estimate and confidence interval not available since the group sample size is zero or disclosive (0-2).

Furthermore, data from Abstract of Regional Statistics, New Earnings Survey and ASHE have been combined to create long run historical series of full-time weekly earnings data for London and Great Britain, stretching back to 1965, and is broken down by sex.
n
Skewness and index futures return
data.niaid.nih.gov
datadryad.org
zip
Updated Feb 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qunzi Zhang (2020). Skewness and index futures return [Dataset]. http://doi.org/10.5061/dryad.866t1g1n1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.866t1g1n1
Dataset updated
Feb 25, 2020
Dataset provided by
Shandong University
Authors
Qunzi Zhang
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
In this paper, we show that the individual skewness, defined as the average of monthly skewness across firms, performs very well at predicting the return of S&P 500 index futures. This result holds after controlling for the liquidity risk or for the current business cycle conditions. We also find that individual skewness performs very well at predicting index futures returns out-of-sample.
n
Data from: Recognition of sounds by ensembles of proteinoids
data.niaid.nih.gov
datadryad.org
zip
Updated Jul 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Panagiotis Mougkogiannis; Andrew Adamatzky (2023). Recognition of sounds by ensembles of proteinoids [Dataset]. http://doi.org/10.5061/dryad.6wwpzgn49
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.6wwpzgn49
Dataset updated
Jul 21, 2023
Dataset provided by
University of the West of England
Authors
Panagiotis Mougkogiannis; Andrew Adamatzky
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Proteinoids are artificial polymers that imitate certain characteristics of natural proteins, including self-organization, catalytic activity, and responsiveness to external stimuli. This paper investigates the potential of proteinoids as organic audio signal processors. We convert sounds of the English alphabet into waveforms of electrical potential, feed the waveforms into proteinoid solutions and record electrical responses of the proteinoids. We also undertake a detailed comparison of proteinoids’ electrical responses (frequencies, periods, and amplitudes) with original input signals. We found that responses of proteinoids are less regular and have lower dominant frequency, wider distribution of proteinoids and less skewed distribution of amplitudes compared with input signals. We found that letters of the English alphabet uniquely map onto a pattern of electrical activity of a proteinoid ensemble, that is the proteinoid ensembles recognise spoken letters of the English alphabet. The finding will be used in further designs of organic electronic devices, based on ensembles of proteinoids, for sound processing and speech recognition. Methods We used a high-resolution data logger, specifically the ADC-24 from Pico Technology in the UK, along with iridium-coated stainless steel sub-dermal needle electrodes from Spes Medica S.r.l. in Italy, to accurately measure the electrical activity of the proteinoids. In order to measure the potential difference, pairs of electrodes were arranged with a distance of approximately 10 mm between each electrode. All electrical activity was captured at a rate of one sample per second. The data logger captured multiple measurements, up to 600 per second, and saved their average for further study. The BK 4060B function generator, referred to as the "device" in this paper, was used to generate electrical spikes. The device is a dual-channel function/arbitrary waveform generator. The device has the ability to generate a wide range of waveforms with high accuracy. These include sine, square, triangle, pulse, and arbitrary waveforms. In real point-by-point arbitrary mode, the maximum waveform generation rate is 75 MSa/s. However, in direct digital synthesis mode, the maximum waveform generation rate increases to 300 MSa/s. A setup was designed for an experiment to analyse how proteinoids respond to audio signals. The audio signals were captured using the microphone of the laptop and then processed using Matlab. The outcome of this processing yielded CSV files that consisted of a solitary column of potential values. The audio signals consisted of the letters A to Z of the English alphabet. The letters were pronounced by a male speaker using the vocal database available at the following URL: https://freesound.org/people/dersuperanton/sounds/434730/. The database contains recordings of individual letters being spoken in isolation as well as with different tones. We proceeded by using electrodes that were connected to a BK precision 4053 B transfer function in order to apply the CSV audio files to the proteinoids. The Picoscope 4000 series oscilloscope was used to record the response of proteinoids to audio signals. The data obtained was saved as CSV files for further analysis.
n
Data from: A rapid and scalable method for multilocus species delimitation...
data.niaid.nih.gov
datadryad.org
zip
Updated Apr 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomochika Fujisawa; Amr Aswad; Timothy G. Barraclough (2016). A rapid and scalable method for multilocus species delimitation using Bayesian model comparison and rooted triplets [Dataset]. http://doi.org/10.5061/dryad.3cb25
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.3cb25
Dataset updated
Apr 5, 2016
Dataset provided by
Kyoto University
Imperial College London
Authors
Tomochika Fujisawa; Amr Aswad; Timothy G. Barraclough
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
UK
Description
Multilocus sequence data provide far greater power to resolve species limits than the single locus data typically used for broad surveys of clades. However, current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations. New methods are therefore needed to open up the power of multilocus approaches to larger systematic surveys. Here, we present a rapid and scalable method that introduces 2 new innovations. First, the method reduces the complexity of likelihood calculations by decomposing the tree into rooted triplets. The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent. A Bayesian model comparison framework was developed and the best delimitation found by comparing the product of posterior probabilities of all triplets. The second innovation is a new dynamic programming algorithm for finding the optimum delimitation from all those compatible with a guide tree by successively analyzing subtrees defined by each node. This algorithm removes the need for heuristic searches used by current methods, and guarantees that the best solution is found and potentially could be used in other systematic applications. We assessed the performance of the method with simulated, published, and newly generated data. Analyses of simulated data demonstrate that the combined method has favorable statistical properties and scalability with increasing sample sizes. Analyses of empirical data from both eukaryotes and prokaryotes demonstrate its potential for delimiting species in real cases.
Data from: Evaluating the contributions of purifying selection and...
zenodo.org
datadryad.org
txt
Updated Jun 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ana Y. Morales-Arce; Ana Y. Morales-Arce; Rebecca Harris; Anne Stone; Jeffrey Jensen; Rebecca Harris; Anne Stone; Jeffrey Jensen (2022). Evaluating the contributions of purifying selection and progeny-skew in dictating within-host Mycobacterium tuberculosis evolution [Dataset]. http://doi.org/10.5061/dryad.1ns1rn8qq
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.1ns1rn8qq
Dataset updated
Jun 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ana Y. Morales-Arce; Ana Y. Morales-Arce; Rebecca Harris; Anne Stone; Jeffrey Jensen; Rebecca Harris; Anne Stone; Jeffrey Jensen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The within-host evolutionary dynamics of TB remain unclear, and underlying biological characteristics render standard population genetic approaches based upon the Wright-Fisher model largely inappropriate. In addition, the compact genome combined with an absence of recombination is expected to result in strong purifying selection effects. Thus, it is imperative to establish a biologically-relevant evolutionary framework incorporating these factors in order to enable an accurate study of this important human pathogen. Further, such a model is critical for inferring fundamental evolutionary parameters related to patient treatment, including mutation rates and the severity of infection bottlenecks. We here implement such a model and infer the underlying evolutionary parameters governing within-patient evolutionary dynamics. Results demonstrate that the progeny skew associated with the clonal nature of TB severely reduces genetic diversity and that the neglect of this parameter in previous studies has led to significant mis-inference of mutation rates. As such, our results suggest an underlying de novo mutation rate that is considerably faster than previously inferred, and a progeny distribution differing significantly from Wright-Fisher assumptions. This inference represents a more appropriate evolutionary null model, against which the periodic effects of positive selection, associated with drug-resistance for example, may be better assessed.
Data from: Sediment particle size analysis for stations from the Western...
data-search.nerc.ac.uk
http
Updated Jul 25, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UK Polar Data Centre, Natural Environment Research Council, UK Research & Innovation (2020). Sediment particle size analysis for stations from the Western Barents Sea for summer 2017 and 2018 [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/api/records/GB_NERC_BAS_PDC_01373
Explore at:
httpAvailable download formats
Dataset updated
Jul 25, 2020
Dataset provided by
Natural Environment Research Councilhttps://www.ukri.org/councils/nerc
Authors
UK Polar Data Centre, Natural Environment Research Council, UK Research & Innovation
Time period covered
Jul 19, 2018 - Jul 28, 2018
Area covered

Description
Sediment particle size frequency distributions from the USNL (Unites States Naval Laboratory) box cores were determined optically using a Malvern Mastersizer 2000 He-Ne LASER diffraction sizer and were used to resolve mean particle size, sorting, skewness and kurtosis.

Samples were collected on cruises JR16006 and JR17007.

Funding was provided by ''The Changing Arctic Ocean Seafloor (ChAOS) - how changing sea ice conditions impact biological communities, biogeochemical processes and ecosystems'' project (NE/N015894/1 and NE/P006426/1, 2017-2021), part of the NERC funded Changing Arctic Ocean programme.
f
Estimated confidence interval for Virginia ground water well yields data (in...
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Yan (2023). Estimated confidence interval for Virginia ground water well yields data (in gal/min/ft). [Dataset]. http://doi.org/10.1371/journal.pone.0269971.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0269971.t004
Dataset updated
Jun 14, 2023
Dataset provided by
PLOS ONE
Authors
Li Yan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Estimated confidence interval for Virginia ground water well yields data (in gal/min/ft).
d
Grain-size distribution of sediments from DSDP Leg 65 Holes
search.dataone.org
doi.pangaea.de
Updated Jan 6, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gutiérrez-Estrada, Mario (2018). Grain-size distribution of sediments from DSDP Leg 65 Holes [Dataset]. http://doi.org/10.1594/PANGAEA.818016
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.818016
Dataset updated
Jan 6, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Gutiérrez-Estrada, Mario
Time period covered
Jan 24, 1979 - Mar 5, 1979
Area covered

Description
The grain-size distribution of 223 unconsolidated sediment samples from four DSDP sites at the mouth of the Gulf of California was determined using sieve and pipette techniques. Shepard's (1954) and Inman's (1952) classification schemes were used for all samples. Most of the sediments are hemipelagic with minor turbidites of terrigenous origin. Sediment texture ranges from silty sand to silty clay. On the basis of grain-size parameters, the sediments can be divided into the following groups: (1) poorly to very poorly sorted coarse and medium sand; and (2) poorly to very poorly sorted fine to very fine sand and clay.

Facebook

Twitter

Click to copy link

Link copied

Cite

Raymond B. Huey; Eric R. Pianka (2018). Body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down? [Dataset]. http://doi.org/10.5061/dryad.45g3s

Data from: Body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down?

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.45g3s

Dataset updated

Aug 4, 2018

Dataset provided by

University of Washington
The University of Texas at Austin

Authors

Raymond B. Huey; Eric R. Pianka

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Area covered

Australia, Africa, North America

Description

The performance of ectotherms integrated over time depends in part on the position and shape of the distribution of body temperatures (Tb) experienced during activity. For several complementary reasons, physiological ecologists have long expected that Tb distributions during activity should have a long left tail (left-skewed); but only infrequently have they quantified the magnitude and direction of Tb skewness in nature.
To evaluate whether left-skewed Tb distributions are general for diurnal desert lizards, we compiled and analyzed Tb (∑ = 9,023 temperatures) from our own prior studies of active desert lizards on three continents (25 species in Western Australia, 10 in the Kalahari Desert of Africa, and 10 species in western North America). We gathered these data over several decades, using standardized techniques.
Many species showed significantly left-skewed Tb distributions, even when records were restricted to summer months. However, magnitudes of skewness were always small, such that mean Tb were never more than 1°C lower than median Tb. The significance of Tb skewness was sensitive to sample size, and power tests reinforced this sensitivity.
The magnitude of skewness was not obviously related to phylogeny, desert, body size, or median body temperature. Moreover, formal phylogenetic analysis is inappropriate because geography and phylogeny are confounded (that is, are highly collinear).
Skewness might be limited if lizards pre-warm inside retreats before emerging in the morning, emerge only when operative temperatures are high enough to speed warming to activity Tb, or if cold lizards are especially wary and difficult to spot or catch. Telemetry studies may help evaluate these possibilities.

Clear search

Close search

Google apps

Main menu

Data from: Body temperature distributions of active diurnal lizards in three...

Dataset for: Some Remarks on the R2 for Clustering

Additional file 3 of Modelling count, bounded and skewed continuous outcomes...

Impact of limited data availability on the accuracy of project duration...

Data from: Selection on skewed characters and the paradox of stasis

Supplementary data for the paper "Why psychologists should not default to...

Additional file 2 of Modelling count, bounded and skewed continuous outcomes...

Data from: Uneven missing data skew phylogenomic relationships within the...

CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

Additional file 1 of Modelling count, bounded and skewed continuous outcomes...

Improved global soil salinity and sodicity mapping through Box-Cox-based...

Data from: Visualizing Type II Error in Normality Tests

Earnings by Workplace, Borough

Skewness and index futures return

Data from: Recognition of sounds by ensembles of proteinoids

Data from: A rapid and scalable method for multilocus species delimitation...

Data from: Evaluating the contributions of purifying selection and...

Data from: Sediment particle size analysis for stations from the Western...

Estimated confidence interval for Virginia ground water well yields data (in...

Grain-size distribution of sediments from DSDP Leg 65 Holes

Data from: Body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down?