35 datasets found

Reaction times and other skewed distributions: problems with the mean and...
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Rousselet; Rand Wilcox (2023). Reaction times and other skewed distributions: problems with the mean and the median [Dataset]. http://doi.org/10.6084/m9.figshare.6911924.v4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6911924.v4
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Guillaume Rousselet; Rand Wilcox
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.
f
Model evaluation for positive COVID-19 cases.
plos.figshare.com
xls
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teresa-Thuong Le; Xiyue Liao (2024). Model evaluation for positive COVID-19 cases. [Dataset]. http://doi.org/10.1371/journal.pone.0302324.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302324.t002
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Teresa-Thuong Le; Xiyue Liao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COVID-19 prediction has been essential in the aid of prevention and control of the disease. The motivation of this case study is to develop predictive models for COVID-19 cases and deaths based on a cross-sectional data set with a total of 28,955 observations and 18 variables, which is compiled from 5 data sources from Kaggle. A two-part modeling framework, in which the first part is a logistic classifier and the second part includes machine learning or statistical smoothing methods, is introduced to model the highly skewed distribution of COVID-19 cases and deaths. We also aim to understand what factors are most relevant to COVID-19’s occurrence and fatality. Evaluation criteria such as root mean squared error (RMSE) and mean absolute error (MAE) are used. We find that the two-part XGBoost model perform best with predicting the entire distribution of COVID-19 cases and deaths. The most important factors relevant to either COVID-19 cases or deaths include population and the rate of primary care physicians.
Data from: Improving structured population models with more realistic...
zenodo.org
data.niaid.nih.gov
+1more
Updated Jun 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak; Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak (2022). Data from: Improving structured population models with more realistic representations of non-normal growth [Dataset]. http://doi.org/10.5061/dryad.t6c3573
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.t6c3573
Dataset updated
Jun 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak; Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Structured population models are among the most widely used tools in ecology and evolution. Integral projection models (IPMs) use continuous representations of how survival, reproduction, and growth change as functions of state variables such as size, requiring fewer parameters to be estimated than projection matrix models (PPMs). Yet almost all published IPMs make an important assumption: that size-dependent growth transitions are or can be transformed to be normally distributed. In fact, many organisms exhibit highly skewed size transitions. Small individuals can grow more than they can shrink, and large individuals may often shrink more dramatically than they can grow. Yet the implications of such skew for inference from IPMs has not been explored, nor have general methods been developed to incorporate skewed size transitions into IPMs, or deal with other aspects of real growth rates, including bounds on possible growth or shrinkage. 2. Here we develop a flexible approach to modeling skewed growth data using a modified beta regression model. We propose that sizes first be converted to a (0,1) interval by estimating size-dependent minimum and maximum sizes through quantile regression. Transformed data can then be modeled using beta regression with widely available statistical tools. We demonstrate the utility of this approach using demographic data for a long-lived plant, gorgonians, and an epiphytic lichen. Specifically, we compare inferences of population parameters from discrete PPMs to those from IPMs that either assume normality or incorporate skew using beta regression or, alternatively, a skewed normal model. 3. The beta and skewed normal distributions accurately capture the mean, variance, and skew of real growth distributions. Incorporating skewed growth into IPMs decreases population growth and estimated lifespan relative to IPMs that assume normally-distributed growth, and more closely approximate the parameters of PPMs that do not assume a particular growth distribution. A bounded distribution, such as the beta, also avoids the eviction problem caused by predicting some growth outside the modeled size range. 4. Incorporating biologically relevant skew in growth data has important consequences for inference from IPMs. The approaches we outline here are flexible and easy to implement with existing statistical tools.
Data from: Body temperature distributions of active diurnal lizards in three...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Aug 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raymond B. Huey; Eric R. Pianka (2018). Body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down? [Dataset]. http://doi.org/10.5061/dryad.45g3s
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.45g3s
Dataset updated
Aug 4, 2018
Dataset provided by
The University of Texas at Austin
University of Washington
Authors
Raymond B. Huey; Eric R. Pianka
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Australia, Africa, North America
Description
The performance of ectotherms integrated over time depends in part on the position and shape of the distribution of body temperatures (Tb) experienced during activity. For several complementary reasons, physiological ecologists have long expected that Tb distributions during activity should have a long left tail (left-skewed); but only infrequently have they quantified the magnitude and direction of Tb skewness in nature.

To evaluate whether left-skewed Tb distributions are general for diurnal desert lizards, we compiled and analyzed Tb (∑ = 9,023 temperatures) from our own prior studies of active desert lizards on three continents (25 species in Western Australia, 10 in the Kalahari Desert of Africa, and 10 species in western North America). We gathered these data over several decades, using standardized techniques.

Many species showed significantly left-skewed Tb distributions, even when records were restricted to summer months. However, magnitudes of skewness were always small, such that mean Tb were never more than 1°C lower than median Tb. The significance of Tb skewness was sensitive to sample size, and power tests reinforced this sensitivity.

The magnitude of skewness was not obviously related to phylogeny, desert, body size, or median body temperature. Moreover, formal phylogenetic analysis is inappropriate because geography and phylogeny are confounded (that is, are highly collinear).

Skewness might be limited if lizards pre-warm inside retreats before emerging in the morning, emerge only when operative temperatures are high enough to speed warming to activity Tb, or if cold lizards are especially wary and difficult to spot or catch. Telemetry studies may help evaluate these possibilities.
Data from: Selection on skewed characters and the paradox of stasis
zenodo.org
datadryad.org
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin; Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin (2022). Data from: Selection on skewed characters and the paradox of stasis [Dataset]. http://doi.org/10.5061/dryad.pt07g
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.pt07g
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin; Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Observed phenotypic responses to selection in the wild often differ from predictions based on measurements of selection and genetic variance. An overlooked hypothesis to explain this paradox of stasis is that a skewed phenotypic distribution affects natural selection and evolution. We show through mathematical modelling that, when a trait selected for an optimum phenotype has a skewed distribution, directional selection is detected even at evolutionary equilibrium, where it causes no change in the mean phenotype. When environmental effects are skewed, Lande and Arnold's (1983) directional gradient is in the direction opposite to the skew. In contrast, skewed breeding values can displace the mean phenotype from the optimum, causing directional selection in the direction of the skew. These effects can be partitioned out using alternative selection estimates based on average derivatives of individual relative fitness, or additive genetic covariances between relative fitness and trait (Robertson-Price identity). We assess the validity of these predictions using simulations of selection estimation under moderate samples size. Ecologically relevant traits may commonly have skewed distributions, as we here exemplify with avian laying date – repeatedly described as more evolutionarily stable than expected –, so this skewness should be accounted for when investigating evolutionary dynamics in the wild.
U
Annual peak-flow data and results of flood-frequency analysis for 76...
data.usgs.gov
catalog.data.gov
Updated Sep 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Wagner; Jon Voss; Roger D.; David Heimann (2024). Annual peak-flow data and results of flood-frequency analysis for 76 selected streamflow gaging stations operated by the U.S. Geological Survey in the upper White River basin, Missouri and Arkansas, computed using an updated generalized (regional) flood skew [Dataset]. http://doi.org/10.5066/P9C3L7IN
Explore at:
Unique identifier
https://doi.org/10.5066/P9C3L7IN
Dataset updated
Sep 3, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Daniel Wagner; Jon Voss; Roger D.; David Heimann
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1904 - 2020
Area covered
Arkansas, Missouri
Description
This dataset contains site information, basin characteristics, results of flood-frequency analysis, and a generalized (regional) flood skew for 76 selected streamgages operated by the U.S. Geological Survey (USGS) in the upper White River basin (4-digit hydrologic unit 1101) in southern Missouri and northern Arkansas. The Little Rock District U.S. Army Corps of Engineers (USACE) needed updated estimates of streamflows corresponding to selected annual exceedance probabilities (AEPs) and a basin-specific regional flood skew. USGS selected 111 candidate streamgages in the study area that had 20 or more years of gaged annual peak-flow data available through the 2020 water year. After screening for regulation, urbanization, redundant/nested basins, drainage areas greater than 2,500 square miles, and streamgage basins located in the Mississippi Alluvial Plain (8-digit hydrologic unit 11010013), 77 candidate streamgages remained. After conducting the initial flood-frequency analysis ...
Z
Data from: A broader flight season for Norway's Odonata across a century and...
data.niaid.nih.gov
zenodo.org
Updated May 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patten, Michael (2023). A broader flight season for Norway's Odonata across a century and a half [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7901564
Explore at:
Dataset updated
May 6, 2023
Dataset provided by
Benson, Brittany
Patten, Michael
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Norway
Description
As global climate continues to change, so too will phenology of a wide range of insects. Changes in flight season usually are characterised as shifts to earlier dates or means, with attention less often paid to flight season breadth or whether seasons are now skewed. We amassed flight season data for the insect order Odonata, the dragonflies and damselflies, for Norway over the past century-and-a-half to examine the form of flight season change. By means of Bayesian analyses that incorporated uncertainty relative to annual variability in survey effort, we estimated shifts in flight season mean, breadth, and skew. We focussed on flight season breadth, positing that it will track documented growing season expansion. A specific mechanism explored was shifts in voltinism, the number of generations per year, which tends to increase with warming. We found strong evidence for an increase in flight season breadth but much less for a shift in mean, with any shift of the latter tending toward a later mean. Skew has become rightward for suborder Zygoptera, the damselflies, but not for Anisoptera, the dragonflies, or for the Odonata as a whole. We found weak support for voltinism as a predictor of broader flight season; instead, voltinism acted interactively with use of human-modified habitats, including decrease in shading (e.g., from timber extraction). Other potential mechanisms that link warming with broadening of flight season include protracted emergence and cohort splitting, both of which have been documented in the Odonata. It is likely that warming-induced broadening of flight seasons of these widespread insect predators will have wide-ranging consequences for freshwater ecosystems.
Data from: Comparing measures of breeding inequality and opportunity for...
zenodo.org
data.niaid.nih.gov
+1more
txt
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier; Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier (2022). Data from: Comparing measures of breeding inequality and opportunity for selection with sexual selection on a quantitative character in bighorn rams [Dataset]. http://doi.org/10.5061/dryad.vb73f
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vb73f
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier; Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The reliability and consistency of the many measures proposed to quantify sexual selection have been questioned for decades. Realized selection on quantitative characters measured by the selection differential i was approximated by metrics based on variance in breeding success, using either the opportunity for sexual selection Is or indices of inequality. There is no consensus about which metric best approximates realized selection on sexual characters. Recently, the opportunity for selection on character mean OSM was proposed to quantify the maximum potential selection on characters. Using 21 years of data on bighorn sheep (Ovis canadensis), we investigated the correlations between seven indices of inequality, Is, OSM and i on horn length of males. Bighorn sheep are ideal for this comparison because they are highly polygynous, sexually dimorphic, ram horn length is under strong sexual selection, and we have detailed knowledge of individual breeding success. Different metrics provided conflicting information, potentially leading to spurious conclusions about selection patterns. Iδ, an index of breeding inequality, and to a lesser extent Is, showed the highest correlation with i on horn length, suggesting that these indices document breeding inequality in a selection context. OSM on horn length was strongly correlated with i, Is, and indices of inequality. By integrating information on both realized sexual selection and breeding inequality, OSM appeared to be the best proxy of sexual selection and may be best suited to explore its ecological bases.
Z
CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...
data.niaid.nih.gov
zenodo.org
Updated Jun 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Granroth-Wilding (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7957401
Explore at:
Dataset updated
Jun 4, 2024
Dataset provided by
Drew McCornack
Lele Cao
Dhiana Deva Cavacanti Rocha
Vilhelm von Ehrenheim
Mark Granroth-Wilding
Richard Anselmo Stahl
Armin Catovic
Description
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

Background and Motivation

In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2

Paper: to be published
f
Grain size analysis of ganga river at Varanasi
figshare.com
data.4tu.nl
xlsx
Updated May 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Pandey (2022). Grain size analysis of ganga river at Varanasi [Dataset]. http://doi.org/10.4121/19752577.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.4121/19752577.v1
Dataset updated
May 13, 2022
Dataset provided by
4TU.ResearchData
Authors
Abhishek Pandey
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Area covered
Varanasi, Ganges
Description
The data shows the station codes of all the 20 sites identified as K1 to K20. The value such as Ø5, Ø16, Ø25, Ø50, Ø75, Ø84, Ø95 and Ø99 for all the 20 stations are shown in the table along with values of statical perameters such as MEAN, STANDARD DEVIATION , SKEWNESS, KURTOSIS for all the 20 stations.
f
Cumulative COVID-19 cases and deaths summary statistics.
figshare.com
plos.figshare.com
xls
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teresa-Thuong Le; Xiyue Liao (2024). Cumulative COVID-19 cases and deaths summary statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0302324.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302324.t001
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Teresa-Thuong Le; Xiyue Liao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cumulative COVID-19 cases and deaths summary statistics.
E
A database of 100 years (1915-2014) of coastal flooding in the UK
edmed.seadatanet.org
bodc.ac.uk
+2more
nc
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Southampton School of Ocean and Earth Science (2024). A database of 100 years (1915-2014) of coastal flooding in the UK [Dataset]. https://edmed.seadatanet.org/report/6120/
Explore at:
ncAvailable download formats
Dataset updated
Nov 21, 2024
Dataset authored and provided by
University of Southampton School of Ocean and Earth Science
License
https://vocab.nerc.ac.uk/collection/L08/current/UN/https://vocab.nerc.ac.uk/collection/L08/current/UN/
Time period covered
Jan 1, 1915 - Dec 31, 2014
Area covered

Description
This database, and the accompanying website called ‘SurgeWatch’ (http://surgewatch.stg.rlp.io), provides a systematic UK-wide record of high sea level and coastal flood events over the last 100 years (1915-2014). Derived using records from the National Tide Gauge Network, a dataset of exceedence probabilities from the Environment Agency and meteorological fields from the 20th Century Reanalysis, the database captures information of 96 storm events that generated the highest sea levels around the UK since 1915. For each event, the database contains information about: (1) the storm that generated that event; (2) the sea levels recorded around the UK during the event; and (3) the occurrence and severity of coastal flooding as consequence of the event. The data are presented to be easily assessable and understandable to a wide range of interested parties. The database contains 100 files; four CSV files and 96 PDF files. Two CSV files contain the meteorological and sea level data for each of the 96 events. A third file contains the list of the top 20 largest skew surges at each of the 40 study tide gauge site. In the file containing the sea level and skew surge data, the tide gauge sites are numbered 1 to 40. A fourth accompanying CSV file lists, for reference, the site name and location (longitude and latitude). A description of the parameters in each of the four CSV files is given in the table below. There are also 96 separate PDF files containing the event commentaries. For each event these contain a concise narrative of the meteorological and sea level conditions experienced during the event, and a succinct description of the evidence available in support of coastal flooding, with a brief account of the recorded consequences to people and property. In addition, these contain graphical representation of the storm track and mean sea level pressure and wind fields at the time of maximum high water, the return period and skew surge magnitudes at sites around the UK, and a table of the date and time, offset return period, water level, predicted tide and skew surge for each site where the 1 in 5 year threshold was reached or exceeded for each event. A detailed description of how the database was created is given in Haigh et al. (2015). Coastal flooding caused by extreme sea levels can be devastating, with long-lasting and diverse consequences. The UK has a long history of severe coastal flooding. The recent 2013-14 winter in particular, produced a sequence of some of the worst coastal flooding the UK has experienced in the last 100 years. At present 2.5 million properties and £150 billion of assets are potentially exposed to coastal flooding. Yet despite these concerns, there is no formal, national framework in the UK to record flood severity and consequences and thus benefit an understanding of coastal flooding mechanisms and consequences. Without a systematic record of flood events, assessment of coastal flooding around the UK coast is limited. The database was created at the School of Ocean and Earth Science, National Oceanography Centre, University of Southampton with help from the Faculty of Engineering and the Environment, University of Southampton, the National Oceanography Centre and the British Oceanographic Data Centre. Collation of the database and the development of the website was funded through a Natural Environment Research Council (NERC) impact acceleration grant. The database contributes to the objectives of UK Engineering and Physical Sciences Research Council (EPSRC) consortium project FLOOD Memory (EP/K013513/1).
d
Grain-size distribution of sediments from DSDP Leg 65 Holes
search.dataone.org
doi.pangaea.de
Updated Jan 6, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gutiérrez-Estrada, Mario (2018). Grain-size distribution of sediments from DSDP Leg 65 Holes [Dataset]. http://doi.org/10.1594/PANGAEA.818016
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.818016
Dataset updated
Jan 6, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Gutiérrez-Estrada, Mario
Time period covered
Jan 24, 1979 - Mar 5, 1979
Area covered

Description
The grain-size distribution of 223 unconsolidated sediment samples from four DSDP sites at the mouth of the Gulf of California was determined using sieve and pipette techniques. Shepard's (1954) and Inman's (1952) classification schemes were used for all samples. Most of the sediments are hemipelagic with minor turbidites of terrigenous origin. Sediment texture ranges from silty sand to silty clay. On the basis of grain-size parameters, the sediments can be divided into the following groups: (1) poorly to very poorly sorted coarse and medium sand; and (2) poorly to very poorly sorted fine to very fine sand and clay.
d
Data from: To kill a kangaroo: understanding the decision to pursue...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Jul 25, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Holland Jones; Rebecca Bliege Bird; Douglas W. Bird (2013). To kill a kangaroo: understanding the decision to pursue high-risk/high-gain resources [Dataset]. http://doi.org/10.5061/dryad.g1h6b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.g1h6b
Dataset updated
Jul 25, 2013
Dataset provided by
Dryad
Authors
James Holland Jones; Rebecca Bliege Bird; Douglas W. Bird
Time period covered
2013
Area covered
Western Australia
Description
Hunting returns dataDelimited plain text file with the following columns: date, age_cat, sex, foraging_activity, total_foraging_time, total_kcal. These correspond to: (1) date of focal follow, (2) age category of focal forager, (3) sex of focal forager, (4) type of hunt, (5) total hunt time in min, (6) total energetic returns in kcal.hunting.returns.txt
Estimated confidence intervals and lengths for the common mean for Chloride...
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Yan (2023). Estimated confidence intervals and lengths for the common mean for Chloride concentration (in mg/litre) in water. [Dataset]. http://doi.org/10.1371/journal.pone.0269971.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0269971.t006
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Li Yan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Estimated confidence intervals and lengths for the common mean for Chloride concentration (in mg/litre) in water.
f
Mean skewness and kurtosis for simulated data scenarios.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley (2023). Mean skewness and kurtosis for simulated data scenarios. [Dataset]. http://doi.org/10.1371/journal.pone.0191758.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0191758.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mean skewness and kurtosis for simulated data scenarios.
f
Variations and shifts from nominal dimensions.
plos.figshare.com
xls
Updated Dec 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moath Alatefi; Abdulrahman M. Al-Ahmari; Abdullah Yahia AlFaify; Mustafa Saleh (2024). Variations and shifts from nominal dimensions. [Dataset]. http://doi.org/10.1371/journal.pone.0308380.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0308380.t006
Dataset updated
Dec 5, 2024
Dataset provided by
PLOS ONE
Authors
Moath Alatefi; Abdulrahman M. Al-Ahmari; Abdullah Yahia AlFaify; Mustafa Saleh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The rapid advancement of additive manufacturing (AM) requires researchers to keep up with these advancements by continually improving the AM processes. Improving manufacturing processes involves evaluating the process outputs and their conformity to the required specifications. Process capability indices, calculated using critical quality characteristics (QCs), have long been used in the evaluation process due to their proven effectiveness. AM processes typically involve multi-correlated critical QCs, indicating the need to develop a multivariate process capability index (MPCI) rather than a univariate capability index, which may lead to misleading results. In this regard, this study proposes a general methodological framework for evaluating AM processes using MPCI. The proposed framework starts by identifying the AM process and product design. Fused Deposition Modeling (FDM) is chosen for this investigation. Then, the specification limits associated with critical QCs are established. To ensure that the MPCI assumptions are met, the critical QCs data are examined for normality, stability, and correlation. Additionally, the MPCI is estimated by simulating a large sample using the properties of the collected QCs data and determining the percent of nonconforming (PNC). Furthermore, the FDM process and its capable tolerance limits are then assessed using the proposed MPCI. Finally, the study presents a sensitivity analysis of the FDM process and suggestions for improvement based on the analysis of assignable causes of variation. The results revealed that the considered process mean is shifted for all QCs, and the most variation is associated with part diameter data. Moreover, the process data are not normally distributed, and the proposed transformation algorithm performs well in reducing data skewness. Also, the performance of the FDM process according to different designations of specification limits was estimated. The results showed that the FDM process is incapable of different designs except with very coarse specifications.
Descriptive statistics of response styles measures and the overall mean,...
plos.figshare.com
figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dirk Tempelaar; Bart Rienties; Quan Nguyen (2023). Descriptive statistics of response styles measures and the overall mean, median, reliability measures alpha and omega and skewness of all response styles. [Dataset]. http://doi.org/10.1371/journal.pone.0233977.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0233977.t001
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Dirk Tempelaar; Bart Rienties; Quan Nguyen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Descriptive statistics of response styles measures and the overall mean, median, reliability measures alpha and omega and skewness of all response styles.
f
Data from: Tangency portfolio weights under a skew-normal model in small and...
tandf.figshare.com
pdf
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farrukh Javed; Stepan Mazur; Erik Thorsén (2024). Tangency portfolio weights under a skew-normal model in small and large dimensions [Dataset]. http://doi.org/10.6084/m9.figshare.24093686.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24093686.v1
Dataset updated
May 15, 2024
Dataset provided by
Taylor & Francis
Authors
Farrukh Javed; Stepan Mazur; Erik Thorsén
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper, we investigate the distributional properties of the estimated tangency portfolio (TP) weights assuming that the asset returns follow a matrix variate closed skew-normal distribution. We establish a stochastic representation of the linear combination of the estimated TP weights that fully characterizes its distribution. Using the stochastic representation we derive the mean and variance of the estimated weights of TP which are of key importance in portfolio analysis. Furthermore, we provide the asymptotic distribution of the linear combination of the estimated TP weights under the high-dimensional asymptotic regime, i.e., the dimension of the portfolio p and the sample size n tend to infinity such that p/n→c∈(0,1). A good performance of the theoretical findings is documented in the simulation study. In an empirical study, we apply the theoretical results to real data of the stocks included in the S&P 500 index.
f
Posterior mean odds ratios and 95% credible intervals of the regression...
plos.figshare.com
figshare.com
xls
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph B. Sempa; Theresa M. Rossouw; Emmanuel Lesaffre; Martin Nieuwoudt (2023). Posterior mean odds ratios and 95% credible intervals of the regression coefficients for the binary longitudinal models with response: CD4 counts ≥500 cells/μL. [Dataset]. http://doi.org/10.1371/journal.pone.0224723.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0224723.t003
Dataset updated
Jun 20, 2023
Dataset provided by
PLOS ONE
Authors
Joseph B. Sempa; Theresa M. Rossouw; Emmanuel Lesaffre; Martin Nieuwoudt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Posterior mean odds ratios and 95% credible intervals of the regression coefficients for the binary longitudinal models with response: CD4 counts ≥500 cells/μL.

Facebook

Twitter

Click to copy link

Link copied

Cite

Guillaume Rousselet; Rand Wilcox (2023). Reaction times and other skewed distributions: problems with the mean and the median [Dataset]. http://doi.org/10.6084/m9.figshare.6911924.v4

Reaction times and other skewed distributions: problems with the mean and the median

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.6911924.v4

Dataset updated

May 31, 2023

Dataset provided by

figshare

Authors

Guillaume Rousselet; Rand Wilcox

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.

Clear search

Close search

Google apps

Main menu

Reaction times and other skewed distributions: problems with the mean and...

Model evaluation for positive COVID-19 cases.

Data from: Improving structured population models with more realistic...

Data from: Body temperature distributions of active diurnal lizards in three...

Data from: Selection on skewed characters and the paradox of stasis

Annual peak-flow data and results of flood-frequency analysis for 76...

Data from: A broader flight season for Norway's Odonata across a century and...

Data from: Comparing measures of breeding inequality and opportunity for...

CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

Grain size analysis of ganga river at Varanasi

Cumulative COVID-19 cases and deaths summary statistics.

A database of 100 years (1915-2014) of coastal flooding in the UK

Grain-size distribution of sediments from DSDP Leg 65 Holes

Data from: To kill a kangaroo: understanding the decision to pursue...

Estimated confidence intervals and lengths for the common mean for Chloride...

Mean skewness and kurtosis for simulated data scenarios.

Variations and shifts from nominal dimensions.

Descriptive statistics of response styles measures and the overall mean,...

Data from: Tangency portfolio weights under a skew-normal model in small and...

Posterior mean odds ratios and 95% credible intervals of the regression...

Reaction times and other skewed distributions: problems with the mean and the median