35 datasets found
  1. Reaction times and other skewed distributions: problems with the mean and...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Rousselet; Rand Wilcox (2023). Reaction times and other skewed distributions: problems with the mean and the median [Dataset]. http://doi.org/10.6084/m9.figshare.6911924.v4
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Guillaume Rousselet; Rand Wilcox
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.

  2. f

    Model evaluation for positive COVID-19 cases.

    • plos.figshare.com
    xls
    Updated Jun 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresa-Thuong Le; Xiyue Liao (2024). Model evaluation for positive COVID-19 cases. [Dataset]. http://doi.org/10.1371/journal.pone.0302324.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Teresa-Thuong Le; Xiyue Liao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COVID-19 prediction has been essential in the aid of prevention and control of the disease. The motivation of this case study is to develop predictive models for COVID-19 cases and deaths based on a cross-sectional data set with a total of 28,955 observations and 18 variables, which is compiled from 5 data sources from Kaggle. A two-part modeling framework, in which the first part is a logistic classifier and the second part includes machine learning or statistical smoothing methods, is introduced to model the highly skewed distribution of COVID-19 cases and deaths. We also aim to understand what factors are most relevant to COVID-19’s occurrence and fatality. Evaluation criteria such as root mean squared error (RMSE) and mean absolute error (MAE) are used. We find that the two-part XGBoost model perform best with predicting the entire distribution of COVID-19 cases and deaths. The most important factors relevant to either COVID-19 cases or deaths include population and the rate of primary care physicians.

  3. Data from: Improving structured population models with more realistic...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak; Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak (2022). Data from: Improving structured population models with more realistic representations of non-normal growth [Dataset]. http://doi.org/10.5061/dryad.t6c3573
    Explore at:
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak; Megan L. Peterson; William Morris; Cristina Linares; Daniel Doak
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
    1. Structured population models are among the most widely used tools in ecology and evolution. Integral projection models (IPMs) use continuous representations of how survival, reproduction, and growth change as functions of state variables such as size, requiring fewer parameters to be estimated than projection matrix models (PPMs). Yet almost all published IPMs make an important assumption: that size-dependent growth transitions are or can be transformed to be normally distributed. In fact, many organisms exhibit highly skewed size transitions. Small individuals can grow more than they can shrink, and large individuals may often shrink more dramatically than they can grow. Yet the implications of such skew for inference from IPMs has not been explored, nor have general methods been developed to incorporate skewed size transitions into IPMs, or deal with other aspects of real growth rates, including bounds on possible growth or shrinkage. 2. Here we develop a flexible approach to modeling skewed growth data using a modified beta regression model. We propose that sizes first be converted to a (0,1) interval by estimating size-dependent minimum and maximum sizes through quantile regression. Transformed data can then be modeled using beta regression with widely available statistical tools. We demonstrate the utility of this approach using demographic data for a long-lived plant, gorgonians, and an epiphytic lichen. Specifically, we compare inferences of population parameters from discrete PPMs to those from IPMs that either assume normality or incorporate skew using beta regression or, alternatively, a skewed normal model. 3. The beta and skewed normal distributions accurately capture the mean, variance, and skew of real growth distributions. Incorporating skewed growth into IPMs decreases population growth and estimated lifespan relative to IPMs that assume normally-distributed growth, and more closely approximate the parameters of PPMs that do not assume a particular growth distribution. A bounded distribution, such as the beta, also avoids the eviction problem caused by predicting some growth outside the modeled size range. 4. Incorporating biologically relevant skew in growth data has important consequences for inference from IPMs. The approaches we outline here are flexible and easy to implement with existing statistical tools.
  4. Data from: Body temperature distributions of active diurnal lizards in three...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Aug 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raymond B. Huey; Eric R. Pianka (2018). Body temperature distributions of active diurnal lizards in three deserts: skewed up or skewed down? [Dataset]. http://doi.org/10.5061/dryad.45g3s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2018
    Dataset provided by
    The University of Texas at Austin
    University of Washington
    Authors
    Raymond B. Huey; Eric R. Pianka
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Australia, Africa, North America
    Description
    1. The performance of ectotherms integrated over time depends in part on the position and shape of the distribution of body temperatures (Tb) experienced during activity. For several complementary reasons, physiological ecologists have long expected that Tb distributions during activity should have a long left tail (left-skewed); but only infrequently have they quantified the magnitude and direction of Tb skewness in nature.
    2. To evaluate whether left-skewed Tb distributions are general for diurnal desert lizards, we compiled and analyzed Tb (∑ = 9,023 temperatures) from our own prior studies of active desert lizards on three continents (25 species in Western Australia, 10 in the Kalahari Desert of Africa, and 10 species in western North America). We gathered these data over several decades, using standardized techniques.
    3. Many species showed significantly left-skewed Tb distributions, even when records were restricted to summer months. However, magnitudes of skewness were always small, such that mean Tb were never more than 1°C lower than median Tb. The significance of Tb skewness was sensitive to sample size, and power tests reinforced this sensitivity.
    4. The magnitude of skewness was not obviously related to phylogeny, desert, body size, or median body temperature. Moreover, formal phylogenetic analysis is inappropriate because geography and phylogeny are confounded (that is, are highly collinear).
    5. Skewness might be limited if lizards pre-warm inside retreats before emerging in the morning, emerge only when operative temperatures are high enough to speed warming to activity Tb, or if cold lizards are especially wary and difficult to spot or catch. Telemetry studies may help evaluate these possibilities.
  5. Data from: Selection on skewed characters and the paradox of stasis

    • zenodo.org
    • datadryad.org
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin; Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin (2022). Data from: Selection on skewed characters and the paradox of stasis [Dataset]. http://doi.org/10.5061/dryad.pt07g
    Explore at:
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin; Suzanne Bonamour; Céline Teplitsky; Anne Charmantier; Pierre-André Crochet; Luis-Miguel Chevin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Observed phenotypic responses to selection in the wild often differ from predictions based on measurements of selection and genetic variance. An overlooked hypothesis to explain this paradox of stasis is that a skewed phenotypic distribution affects natural selection and evolution. We show through mathematical modelling that, when a trait selected for an optimum phenotype has a skewed distribution, directional selection is detected even at evolutionary equilibrium, where it causes no change in the mean phenotype. When environmental effects are skewed, Lande and Arnold's (1983) directional gradient is in the direction opposite to the skew. In contrast, skewed breeding values can displace the mean phenotype from the optimum, causing directional selection in the direction of the skew. These effects can be partitioned out using alternative selection estimates based on average derivatives of individual relative fitness, or additive genetic covariances between relative fitness and trait (Robertson-Price identity). We assess the validity of these predictions using simulations of selection estimation under moderate samples size. Ecologically relevant traits may commonly have skewed distributions, as we here exemplify with avian laying date – repeatedly described as more evolutionarily stable than expected –, so this skewness should be accounted for when investigating evolutionary dynamics in the wild.

  6. U

    Annual peak-flow data and results of flood-frequency analysis for 76...

    • data.usgs.gov
    • catalog.data.gov
    Updated Sep 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Wagner; Jon Voss; Roger D.; David Heimann (2024). Annual peak-flow data and results of flood-frequency analysis for 76 selected streamflow gaging stations operated by the U.S. Geological Survey in the upper White River basin, Missouri and Arkansas, computed using an updated generalized (regional) flood skew [Dataset]. http://doi.org/10.5066/P9C3L7IN
    Explore at:
    Dataset updated
    Sep 3, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Daniel Wagner; Jon Voss; Roger D.; David Heimann
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1904 - 2020
    Area covered
    Arkansas, Missouri
    Description

    This dataset contains site information, basin characteristics, results of flood-frequency analysis, and a generalized (regional) flood skew for 76 selected streamgages operated by the U.S. Geological Survey (USGS) in the upper White River basin (4-digit hydrologic unit 1101) in southern Missouri and northern Arkansas. The Little Rock District U.S. Army Corps of Engineers (USACE) needed updated estimates of streamflows corresponding to selected annual exceedance probabilities (AEPs) and a basin-specific regional flood skew. USGS selected 111 candidate streamgages in the study area that had 20 or more years of gaged annual peak-flow data available through the 2020 water year. After screening for regulation, urbanization, redundant/nested basins, drainage areas greater than 2,500 square miles, and streamgage basins located in the Mississippi Alluvial Plain (8-digit hydrologic unit 11010013), 77 candidate streamgages remained. After conducting the initial flood-frequency analysis ...

  7. Z

    Data from: A broader flight season for Norway's Odonata across a century and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patten, Michael (2023). A broader flight season for Norway's Odonata across a century and a half [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7901564
    Explore at:
    Dataset updated
    May 6, 2023
    Dataset provided by
    Benson, Brittany
    Patten, Michael
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Norway
    Description

    As global climate continues to change, so too will phenology of a wide range of insects. Changes in flight season usually are characterised as shifts to earlier dates or means, with attention less often paid to flight season breadth or whether seasons are now skewed. We amassed flight season data for the insect order Odonata, the dragonflies and damselflies, for Norway over the past century-and-a-half to examine the form of flight season change. By means of Bayesian analyses that incorporated uncertainty relative to annual variability in survey effort, we estimated shifts in flight season mean, breadth, and skew. We focussed on flight season breadth, positing that it will track documented growing season expansion. A specific mechanism explored was shifts in voltinism, the number of generations per year, which tends to increase with warming. We found strong evidence for an increase in flight season breadth but much less for a shift in mean, with any shift of the latter tending toward a later mean. Skew has become rightward for suborder Zygoptera, the damselflies, but not for Anisoptera, the dragonflies, or for the Odonata as a whole. We found weak support for voltinism as a predictor of broader flight season; instead, voltinism acted interactively with use of human-modified habitats, including decrease in shading (e.g., from timber extraction). Other potential mechanisms that link warming with broadening of flight season include protracted emergence and cohort splitting, both of which have been documented in the Odonata. It is likely that warming-induced broadening of flight seasons of these widespread insect predators will have wide-ranging consequences for freshwater ecosystems.

  8. Data from: Comparing measures of breeding inequality and opportunity for...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    txt
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier; Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier (2022). Data from: Comparing measures of breeding inequality and opportunity for selection with sexual selection on a quantitative character in bighorn rams [Dataset]. http://doi.org/10.5061/dryad.vb73f
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier; Alexandre M. Martin; Marco Festa-Bianchet; David W. Coltman; Fanie Pelletier
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The reliability and consistency of the many measures proposed to quantify sexual selection have been questioned for decades. Realized selection on quantitative characters measured by the selection differential i was approximated by metrics based on variance in breeding success, using either the opportunity for sexual selection Is or indices of inequality. There is no consensus about which metric best approximates realized selection on sexual characters. Recently, the opportunity for selection on character mean OSM was proposed to quantify the maximum potential selection on characters. Using 21 years of data on bighorn sheep (Ovis canadensis), we investigated the correlations between seven indices of inequality, Is, OSM and i on horn length of males. Bighorn sheep are ideal for this comparison because they are highly polygynous, sexually dimorphic, ram horn length is under strong sexual selection, and we have detailed knowledge of individual breeding success. Different metrics provided conflicting information, potentially leading to spurious conclusions about selection patterns. Iδ, an index of breeding inequality, and to a lesser extent Is, showed the highest correlation with i on horn length, suggesting that these indices document breeding inequality in a selection context. OSM on horn length was strongly correlated with i, Is, and indices of inequality. By integrating information on both realized sexual selection and breeding inequality, OSM appeared to be the best proxy of sexual selection and may be best suited to explore its ecological bases.

  9. Z

    CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Granroth-Wilding (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7957401
    Explore at:
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Drew McCornack
    Lele Cao
    Dhiana Deva Cavacanti Rocha
    Vilhelm von Ehrenheim
    Mark Granroth-Wilding
    Richard Anselmo Stahl
    Armin Catovic
    Description

    CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

    Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

    Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

    Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

    Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

    Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

    Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

    Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

    Background and Motivation

    In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

    While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

    In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

    However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

    Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2

    Paper: to be published

  10. f

    Grain size analysis of ganga river at Varanasi

    • figshare.com
    • data.4tu.nl
    xlsx
    Updated May 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Pandey (2022). Grain size analysis of ganga river at Varanasi [Dataset]. http://doi.org/10.4121/19752577.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 13, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Abhishek Pandey
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    Varanasi, Ganges
    Description

    The data shows the station codes of all the 20 sites identified as K1 to K20. The value such as Ø5, Ø16, Ø25, Ø50, Ø75, Ø84, Ø95 and Ø99 for all the 20 stations are shown in the table along with values of statical perameters such as MEAN, STANDARD DEVIATION , SKEWNESS, KURTOSIS for all the 20 stations.

  11. f

    Cumulative COVID-19 cases and deaths summary statistics.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresa-Thuong Le; Xiyue Liao (2024). Cumulative COVID-19 cases and deaths summary statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0302324.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Teresa-Thuong Le; Xiyue Liao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cumulative COVID-19 cases and deaths summary statistics.

  12. E

    A database of 100 years (1915-2014) of coastal flooding in the UK

    • edmed.seadatanet.org
    • bodc.ac.uk
    • +2more
    nc
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Southampton School of Ocean and Earth Science (2024). A database of 100 years (1915-2014) of coastal flooding in the UK [Dataset]. https://edmed.seadatanet.org/report/6120/
    Explore at:
    ncAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    University of Southampton School of Ocean and Earth Science
    License

    https://vocab.nerc.ac.uk/collection/L08/current/UN/https://vocab.nerc.ac.uk/collection/L08/current/UN/

    Time period covered
    Jan 1, 1915 - Dec 31, 2014
    Area covered
    Description

    This database, and the accompanying website called ‘SurgeWatch’ (http://surgewatch.stg.rlp.io), provides a systematic UK-wide record of high sea level and coastal flood events over the last 100 years (1915-2014). Derived using records from the National Tide Gauge Network, a dataset of exceedence probabilities from the Environment Agency and meteorological fields from the 20th Century Reanalysis, the database captures information of 96 storm events that generated the highest sea levels around the UK since 1915. For each event, the database contains information about: (1) the storm that generated that event; (2) the sea levels recorded around the UK during the event; and (3) the occurrence and severity of coastal flooding as consequence of the event. The data are presented to be easily assessable and understandable to a wide range of interested parties. The database contains 100 files; four CSV files and 96 PDF files. Two CSV files contain the meteorological and sea level data for each of the 96 events. A third file contains the list of the top 20 largest skew surges at each of the 40 study tide gauge site. In the file containing the sea level and skew surge data, the tide gauge sites are numbered 1 to 40. A fourth accompanying CSV file lists, for reference, the site name and location (longitude and latitude). A description of the parameters in each of the four CSV files is given in the table below. There are also 96 separate PDF files containing the event commentaries. For each event these contain a concise narrative of the meteorological and sea level conditions experienced during the event, and a succinct description of the evidence available in support of coastal flooding, with a brief account of the recorded consequences to people and property. In addition, these contain graphical representation of the storm track and mean sea level pressure and wind fields at the time of maximum high water, the return period and skew surge magnitudes at sites around the UK, and a table of the date and time, offset return period, water level, predicted tide and skew surge for each site where the 1 in 5 year threshold was reached or exceeded for each event. A detailed description of how the database was created is given in Haigh et al. (2015). Coastal flooding caused by extreme sea levels can be devastating, with long-lasting and diverse consequences. The UK has a long history of severe coastal flooding. The recent 2013-14 winter in particular, produced a sequence of some of the worst coastal flooding the UK has experienced in the last 100 years. At present 2.5 million properties and £150 billion of assets are potentially exposed to coastal flooding. Yet despite these concerns, there is no formal, national framework in the UK to record flood severity and consequences and thus benefit an understanding of coastal flooding mechanisms and consequences. Without a systematic record of flood events, assessment of coastal flooding around the UK coast is limited. The database was created at the School of Ocean and Earth Science, National Oceanography Centre, University of Southampton with help from the Faculty of Engineering and the Environment, University of Southampton, the National Oceanography Centre and the British Oceanographic Data Centre. Collation of the database and the development of the website was funded through a Natural Environment Research Council (NERC) impact acceleration grant. The database contributes to the objectives of UK Engineering and Physical Sciences Research Council (EPSRC) consortium project FLOOD Memory (EP/K013513/1).

  13. d

    Grain-size distribution of sediments from DSDP Leg 65 Holes

    • search.dataone.org
    • doi.pangaea.de
    Updated Jan 6, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gutiérrez-Estrada, Mario (2018). Grain-size distribution of sediments from DSDP Leg 65 Holes [Dataset]. http://doi.org/10.1594/PANGAEA.818016
    Explore at:
    Dataset updated
    Jan 6, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Gutiérrez-Estrada, Mario
    Time period covered
    Jan 24, 1979 - Mar 5, 1979
    Area covered
    Description

    The grain-size distribution of 223 unconsolidated sediment samples from four DSDP sites at the mouth of the Gulf of California was determined using sieve and pipette techniques. Shepard's (1954) and Inman's (1952) classification schemes were used for all samples. Most of the sediments are hemipelagic with minor turbidites of terrigenous origin. Sediment texture ranges from silty sand to silty clay. On the basis of grain-size parameters, the sediments can be divided into the following groups: (1) poorly to very poorly sorted coarse and medium sand; and (2) poorly to very poorly sorted fine to very fine sand and clay.

  14. d

    Data from: To kill a kangaroo: understanding the decision to pursue...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 25, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Holland Jones; Rebecca Bliege Bird; Douglas W. Bird (2013). To kill a kangaroo: understanding the decision to pursue high-risk/high-gain resources [Dataset]. http://doi.org/10.5061/dryad.g1h6b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2013
    Dataset provided by
    Dryad
    Authors
    James Holland Jones; Rebecca Bliege Bird; Douglas W. Bird
    Time period covered
    2013
    Area covered
    Western Australia
    Description

    Hunting returns dataDelimited plain text file with the following columns: date, age_cat, sex, foraging_activity, total_foraging_time, total_kcal. These correspond to: (1) date of focal follow, (2) age category of focal forager, (3) sex of focal forager, (4) type of hunt, (5) total hunt time in min, (6) total energetic returns in kcal.hunting.returns.txt

  15. Estimated confidence intervals and lengths for the common mean for Chloride...

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Yan (2023). Estimated confidence intervals and lengths for the common mean for Chloride concentration (in mg/litre) in water. [Dataset]. http://doi.org/10.1371/journal.pone.0269971.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Li Yan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Estimated confidence intervals and lengths for the common mean for Chloride concentration (in mg/litre) in water.

  16. f

    Mean skewness and kurtosis for simulated data scenarios.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley (2023). Mean skewness and kurtosis for simulated data scenarios. [Dataset]. http://doi.org/10.1371/journal.pone.0191758.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean skewness and kurtosis for simulated data scenarios.

  17. f

    Variations and shifts from nominal dimensions.

    • plos.figshare.com
    xls
    Updated Dec 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moath Alatefi; Abdulrahman M. Al-Ahmari; Abdullah Yahia AlFaify; Mustafa Saleh (2024). Variations and shifts from nominal dimensions. [Dataset]. http://doi.org/10.1371/journal.pone.0308380.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Moath Alatefi; Abdulrahman M. Al-Ahmari; Abdullah Yahia AlFaify; Mustafa Saleh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The rapid advancement of additive manufacturing (AM) requires researchers to keep up with these advancements by continually improving the AM processes. Improving manufacturing processes involves evaluating the process outputs and their conformity to the required specifications. Process capability indices, calculated using critical quality characteristics (QCs), have long been used in the evaluation process due to their proven effectiveness. AM processes typically involve multi-correlated critical QCs, indicating the need to develop a multivariate process capability index (MPCI) rather than a univariate capability index, which may lead to misleading results. In this regard, this study proposes a general methodological framework for evaluating AM processes using MPCI. The proposed framework starts by identifying the AM process and product design. Fused Deposition Modeling (FDM) is chosen for this investigation. Then, the specification limits associated with critical QCs are established. To ensure that the MPCI assumptions are met, the critical QCs data are examined for normality, stability, and correlation. Additionally, the MPCI is estimated by simulating a large sample using the properties of the collected QCs data and determining the percent of nonconforming (PNC). Furthermore, the FDM process and its capable tolerance limits are then assessed using the proposed MPCI. Finally, the study presents a sensitivity analysis of the FDM process and suggestions for improvement based on the analysis of assignable causes of variation. The results revealed that the considered process mean is shifted for all QCs, and the most variation is associated with part diameter data. Moreover, the process data are not normally distributed, and the proposed transformation algorithm performs well in reducing data skewness. Also, the performance of the FDM process according to different designations of specification limits was estimated. The results showed that the FDM process is incapable of different designs except with very coarse specifications.

  18. Descriptive statistics of response styles measures and the overall mean,...

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dirk Tempelaar; Bart Rienties; Quan Nguyen (2023). Descriptive statistics of response styles measures and the overall mean, median, reliability measures alpha and omega and skewness of all response styles. [Dataset]. http://doi.org/10.1371/journal.pone.0233977.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Dirk Tempelaar; Bart Rienties; Quan Nguyen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Descriptive statistics of response styles measures and the overall mean, median, reliability measures alpha and omega and skewness of all response styles.

  19. f

    Data from: Tangency portfolio weights under a skew-normal model in small and...

    • tandf.figshare.com
    pdf
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farrukh Javed; Stepan Mazur; Erik Thorsén (2024). Tangency portfolio weights under a skew-normal model in small and large dimensions [Dataset]. http://doi.org/10.6084/m9.figshare.24093686.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 15, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Farrukh Javed; Stepan Mazur; Erik Thorsén
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper, we investigate the distributional properties of the estimated tangency portfolio (TP) weights assuming that the asset returns follow a matrix variate closed skew-normal distribution. We establish a stochastic representation of the linear combination of the estimated TP weights that fully characterizes its distribution. Using the stochastic representation we derive the mean and variance of the estimated weights of TP which are of key importance in portfolio analysis. Furthermore, we provide the asymptotic distribution of the linear combination of the estimated TP weights under the high-dimensional asymptotic regime, i.e., the dimension of the portfolio p and the sample size n tend to infinity such that p/n→c∈(0,1). A good performance of the theoretical findings is documented in the simulation study. In an empirical study, we apply the theoretical results to real data of the stocks included in the S&P 500 index.

  20. f

    Posterior mean odds ratios and 95% credible intervals of the regression...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph B. Sempa; Theresa M. Rossouw; Emmanuel Lesaffre; Martin Nieuwoudt (2023). Posterior mean odds ratios and 95% credible intervals of the regression coefficients for the binary longitudinal models with response: CD4 counts ≥500 cells/μL. [Dataset]. http://doi.org/10.1371/journal.pone.0224723.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Joseph B. Sempa; Theresa M. Rossouw; Emmanuel Lesaffre; Martin Nieuwoudt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Posterior mean odds ratios and 95% credible intervals of the regression coefficients for the binary longitudinal models with response: CD4 counts ≥500 cells/μL.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Guillaume Rousselet; Rand Wilcox (2023). Reaction times and other skewed distributions: problems with the mean and the median [Dataset]. http://doi.org/10.6084/m9.figshare.6911924.v4
Organization logo

Reaction times and other skewed distributions: problems with the mean and the median

Explore at:
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Guillaume Rousselet; Rand Wilcox
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Reproducibility package for the article:Reaction times and other skewed distributions: problems with the mean and the medianGuillaume A. Rousselet & Rand R. Wilcoxpreprint: https://psyarxiv.com/3y54rdoi: 10.31234/osf.io/3y54rThis package contains all the code and data to reproduce the figures and analyses in the article.

Search
Clear search
Close search
Google apps
Main menu