100+ datasets found
  1. Z

    Dataset for: "Quantification of Geometric Errors Made Simple: Application to...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jan 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan (2023). Dataset for: "Quantification of Geometric Errors Made Simple: Application to Main-Group Molecular Structures" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7566470
    Explore at:
    Dataset updated
    Jan 25, 2023
    Dataset provided by
    Vuckovic
    Authors
    Stefan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains geometric energy offset (GEO') values for a set of density functional theory (DFT) methods for the B2se set of molecular structures. The data was generated as part of a research project aimed at quantifying geometric errors in main-group molecular structures. The dataset is in XLSX format created with MS Excel (version 16.69), and contains multiple worksheets with GEO' values for different basis sets and DFT methods. The worksheet headings, such as "AVQZ AVTZ AVDZ VQZ VTZ VDZ" represent different basis sets of Dunning theory, and the naming convention "(A)VnZ = aug-cc-pVnZ" is being used to label the worksheets. The data is organized in columns, with the first column providing the molecular ID and the names of the DFT methods specified in the first row of each worksheet. The molecular structures corresponding to each of these IDs can be found in Figure S1 of the supplementary information of the underlying publication [https://pubs.acs.org/doi/suppl/10.1021/acs.jpca.1c10688/suppl_file/jp1c10688_si_001.pdf]. The data have been generated from quantum-chemical calculations from the G16 and ORCA 5.0.0 packages, with further computational details, methodology, and data validation strategies (e.g., comparisons with higher-level quantum-chemical calculations) given in the supplementary information of the underlying publication [J. Phys. Chem. A 2022, 126, 7, 1300–1311] and its supporting information [https://pubs.acs.org/doi/suppl/10.1021/acs.jpca.1c10688/suppl_file/jp1c10688_si_001.pdf]. The dataset is expected to be useful to researchers in the field of computational chemistry and materials science. All values are given in kcal/mol. The data is generated by the authors of the underlying publication and it is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The data is expected to be re-usable and the quality of the data is assured by the authors. The size of the data is 71 KB.

  2. H

    Error_Checking_Imported_Data

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothy Ebert (2022). Error_Checking_Imported_Data [Dataset]. http://doi.org/10.7910/DVN/IYYU1P
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Timothy Ebert
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The SAS command file checks EPG data for errors. It will always run. However, you will only get the correct output if there are no errors. The data sets "simple psyllid" and "simple aphid" have no errors. Errorchecker will return two tables. The first is a record of the waveforms. Check to make sure that all waveforms are correct. A number of common errors will show here. PLEASE note that Np is a different waveform from NP or nP or np. Also, "NP" is different from " NP" and "NP " or " NP ". I wrote the code to be insensitive to these conditions using the condense() function to eliminate spaces and the upcase() function to make all letters capitals. However, it is safer to correct the problem rather than relying on the program. The second table is a frequency table showing all the transitions and transitional probabilities. Check to make sure that all transitions are possible. Your data is clean if you get these two tables and there are no problems evident in the tables.

  3. e

    Data from: MT@BZ translation corpus v1.0

    • clarin.eurac.edu
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flavia De Camillis; Elena Chiocchetti; Egon W. Stemle (2023). MT@BZ translation corpus v1.0 [Dataset]. https://clarin.eurac.edu/repository/xmlui/handle/20.500.12124/60
    Explore at:
    Dataset updated
    Jun 13, 2023
    Authors
    Flavia De Camillis; Elena Chiocchetti; Egon W. Stemle
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More precisely, it consists of 26 decrees in German and the same 26 in Italian in their official versions, respectively machine translated by the project team into Italian and into German. 10 of them are COVID-19 related decress, while 16 are miscellaneous. Overall, they consist of around 130,000 words. Their machine translation was carried out with a customized version of ModernMT. Later, the corpus was uploaded first into the annotation platform Webanno, then transferred to Inception. Four annotators annotated the translation errors made by the machine according to an ad hoc error taxonomy for quality assessment. Finally, the annotations were curated to create a gold standard corpus.

  4. d

    Data from: Learning from the past: a reverberation of past errors in the...

    • search.dataone.org
    • data.niaid.nih.gov
    • +3more
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Junker; Dominik Endres; Zong Peng Sun; Peter W. Dicke; Martin Giese; Peter Thier (2025). Learning from the past: a reverberation of past errors in the cerebellar climbing fiber signal [Dataset]. http://doi.org/10.5061/dryad.p88b8v8
    Explore at:
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Marc Junker; Dominik Endres; Zong Peng Sun; Peter W. Dicke; Martin Giese; Peter Thier
    Time period covered
    Jan 1, 2019
    Description

    The cerebellum allows us to rapidly adjust motor behavior to the needs of the situation. It is commonly assumed that cerebellum-based motor learning is guided by the difference between the desired and the actual behavior, i.e., by error information. Not only immediate but also future behavior will benefit from an error because it induces lasting changes of parallel fiber synapses on Purkinje cells (PCs), whose output mediates the behavioral adjustments. Olivary climbing fibers, likewise connecting with PCs, are thought to transport information on instant errors needed for the synaptic modification yet not to contribute to error memory. Here, we report work on monkeys tested in a saccadic learning paradigm that challenges this concept. We demonstrate not only a clear complex spikes (CS) signature of the error at the time of its occurrence but also a reverberation of this signature much later, before a new manifestation of the behavior, suitable to improve it.

  5. n

    Data from: Not Normal: the uncertainties of scientific measurements

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David C. Bailey (2016). Not Normal: the uncertainties of scientific measurements [Dataset]. http://doi.org/10.5061/dryad.jb3mj
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 1, 2016
    Dataset provided by
    University of Toronto
    Authors
    David C. Bailey
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Judging the significance and reproducibility of quantitative research requires a good understanding of relevant uncertainties, but it is often unclear how well these have been evaluated and what they imply. Reported scientific uncertainties were studied by analysing 41 000 measurements of 3200 quantities from medicine, nuclear and particle physics, and interlaboratory comparisons ranging from chemistry to toxicology. Outliers are common, with 5σ disagreements up to five orders of magnitude more frequent than naively expected. Uncertainty-normalized differences between multiple measurements of the same quantity are consistent with heavy-tailed Student’s t-distributions that are often almost Cauchy, far from a Gaussian Normal bell curve. Medical research uncertainties are generally as well evaluated as those in physics, but physics uncertainty improves more rapidly, making feasible simple significance criteria such as the 5σ discovery convention in particle physics. Contributions to measurement uncertainty from mistakes and unknown problems are not completely unpredictable. Such errors appear to have power-law distributions consistent with how designed complex systems fail, and how unknown systematic errors are constrained by researchers. This better understanding may help improve analysis and meta-analysis of data, and help scientists and the public have more realistic expectations of what scientific results imply.

  6. H

    Replication data for: A Unified Approach To Measurement Error And Missing...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Blackwell; James Honaker; Gary King (2016). Replication data for: A Unified Approach To Measurement Error And Missing Data: Overview [Dataset]. http://doi.org/10.7910/DVN/29606
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Matthew Blackwell; James Honaker; Gary King
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/29606https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/29606

    Description

    Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b). Notes: This is the first of two articles to appear in the same issue of the same journal by the same authors. The second is “A Unified Approach to Measurement Error and Missing Data: Details and Extensions.” See also: Missing Data

  7. Mean squared error and |Bias| (in brackets) values for different estimators...

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed (2023). Mean squared error and |Bias| (in brackets) values for different estimators for Population I without measurement error. [Dataset]. http://doi.org/10.1371/journal.pone.0261561.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean squared error and |Bias| (in brackets) values for different estimators for Population I without measurement error.

  8. EEG and EMG dataset for the detection of errors introduced by an active...

    • zenodo.org
    txt, zip
    Updated Dec 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niklas Kueper; Kartik Chari; Kartik Chari; Judith Bütefür; Julia Habenicht; Tobias Rossol; Su Kyoung Kim; Su Kyoung Kim; Marc Tabie; Frank Kirchner; Frank Kirchner; Elsa Andrea Kirchner; Elsa Andrea Kirchner; Niklas Kueper; Judith Bütefür; Julia Habenicht; Tobias Rossol; Marc Tabie (2023). EEG and EMG dataset for the detection of errors introduced by an active orthosis device (IJCAI Competition) [Dataset]. http://doi.org/10.5281/zenodo.7966275
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Dec 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Niklas Kueper; Kartik Chari; Kartik Chari; Judith Bütefür; Julia Habenicht; Tobias Rossol; Su Kyoung Kim; Su Kyoung Kim; Marc Tabie; Frank Kirchner; Frank Kirchner; Elsa Andrea Kirchner; Elsa Andrea Kirchner; Niklas Kueper; Judith Bütefür; Julia Habenicht; Tobias Rossol; Marc Tabie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a part of the IJCAI 2023 competition : CC6: IntEr-HRI: Intrinsic Error Evaluation during Human-Robot Interaction (IJCAI'23 Official Website). This dataset repository is divided into 2 versions:

    • Version 1: Training data + Metadata
    • Version 2: Test data

    N.B.: After conducting a small survey to determine the willingness of the participating teams to travel to Macao, it became evident that a significant number of them preferred not to travel. With this in mind, we have decided to modify the initial plan for the online stage of the competition wherein the participating teams can participate from anywhere on Earth. Hope that this motivates more teams to participate. For more detailed information, please visit our competition webpage.

    Although the registration for the offline stage is officially closed, if you still wish to participate, please reach out to us via the contact form available on our webpage.

    This dataset contains recordings of the electroencephalogram (EEG) data from eight subjects who were assisted in moving their right arm by an active orthosis. This is only a part of the complete dataset which also contains electromyogram (EMG) data and the complete dataset will be made public after the end of the competition.

    The orthosis-supported movements were elbow joint movements, i.e., flexion and extension of the right arm. While the orthosis was actively moving the subject's arm, some errors were deliberately introduced for a short duration of time. During this time, the orthosis moved in the opposite direction. The errors are very simple and easy to detect. EEG and EMG data are provided. The recorded EEG data follows the BrainVision Core Data Format 1.0, consisting of a binary data file (.eeg), a header file (.vhdr), and a marker file (.vmrk) (https://www.brainproducts.com/support-resources/brainvision-core-data-format-1-0/). For ease of use, the data can be exported into the widely adopted BIDS format. Furthermore, for data analysis, processing, and classification, two popular options are available - MNE (Python) and EEGLAB (MATLAB).

    If you use our dataset, cite our paper.

    arXiv-issued DOI: https://doi.org/10.48550/arXiv.2305.11996

    BibTeX citation:

    @misc{kueper2023eeg,
    title={EEG and EMG dataset for the detection of errors introduced by an active orthosis device},
    author={Niklas Kueper and Kartik Chari and Judith Bütefür and Julia Habenicht and Su Kyoung Kim and Tobias Rossol and Marc Tabie and Frank Kirchner and Elsa Andrea Kirchner},
    year={2023},
    eprint={2305.11996},
    archivePrefix={arXiv},
    primaryClass={cs.HC}
    }

  9. E

    Rule-based Synthetic Data for Japanese GEC

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    tsv
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Rule-based Synthetic Data for Japanese GEC [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7679
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Oct 28, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title: Rule-based Synthetic Data for Japanese GEC. Dataset Contents:This dataset contains two parallel corpora intended for the training and evaluating of models for the NLP (natural language processing) subtask of Japanese GEC (grammatical error correction). These are as follows:Synthetic Corpus - synthesized_data.tsv. This corpus file contains 2,179,130 parallel sentence pairs synthesized using the process described in [1]. Each line of the file consists of two sentences delimited by a tab. The first sentence is the erroneous sentence while the second is the corresponding correction.These paired sentences are derived from data scraped from the keyword-lookup site

  10. n

    Data from: Not so weak-PICO: Leveraging weak supervision for Participants,...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Dec 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anjani Dhrangadhariya; Henning Müller (2022). Not so weak-PICO: Leveraging weak supervision for Participants, Interventions, and Outcomes recognition for systematic review automation [Dataset]. http://doi.org/10.5061/dryad.ncjsxkszr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 13, 2022
    Dataset provided by
    University of Geneva
    Authors
    Anjani Dhrangadhariya; Henning Müller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: PICO (Participants, Interventions, Comparators, Outcomes) analysis is vital but time-consuming for conducting systematic reviews (SRs). Supervised machine learning can help fully automate it, but a lack of large annotated corpora limits the quality of automated PICO recognition systems. The largest currently available PICO corpus is manually annotated, which is an approach that is often too expensive for the scientific community to apply. Depending on the specific SR question, PICO criteria are extended to PICOC (C-Context), PICOT (T-timeframe), and PIBOSO (B-Background, S-Study design, O-Other) meaning the static hand-labelled corpora need to undergo costly re-annotation as per the downstream requirements. We aim to test the feasibility of designing a weak supervision system to extract these entities without hand-labelled data. Methodology: We decompose PICO spans into its constituent entities and re-purpose multiple medical and non-medical ontologies and expert-generated rules to obtain multiple noisy labels for these entities. These labels obtained using several sources are then aggregated using simple majority voting and generative modelling approaches. The resulting programmatic labels are used as weak signals to train a weakly-supervised discriminative model and observe performance changes. We explore mistakes in the currently available PICO corpus that could have led to inaccurate evaluation of several automation methods. Results: We present Weak-PICO, a weakly-supervised PICO entity recognition approach using medical and non-medical ontologies, dictionaries and expert-generated rules. Our approach does not use hand-labelled data. Conclusion: Weak supervision using weak-PICO for PICO entity recognition has encouraging results, and the approach can potentially extend to more clinical entities readily. Methods This upload contains four main zip files.

    ds_cto_dict.zip: This zip file contains the four distant supervision dictionaries (P: participant.txt, I = intervention.txt, intervetion_syn.txt, O: outcome.txt) generated from clinicaltrials.gov using the Methodology described in Distant-CTO (https://aclanthology.org/2022.bionlp-1.34/). These dictionaries were used to create distant supervision labelling functions as described in the Labelling sources subsection of the Methodology. The data was derived from https://clinicaltrials.gov/

    handcrafted_dictionaries.zip: This zip folder contains three files 1) gender_sexuality.txt: a list of possible genders and sexual orientations found across the web. The list needs to be more comprehensive. 2) endpoints_dict.txt: contains outcome names and the names of questionnaires used to measure outcomes assembled from PROM questionnaires and PROMs. and 3) comparator_dict: contains a list of idiosyncratic comparator terms like a sham, saline, placebo, etc., compiled from the literature search. The list needs to be more comprehensive.

    test_ebm_correctedlabels.tsv: EBM-PICO is a widely used dataset with PICO annotations at two levels: span-level or coarse-grained and entity-level or fine-grained. Span-level annotations encompass the full information about each class. Entity-level annotations cover the more fine-grained information at the entity level, with PICO classes further divided into fine-grained subclasses. For example, the coarse-grained Participant span is further divided into participant age, gender, condition and sample size in the randomised controlled trial. This dataset comes pre-divided into a training set (n=4,933) annotated through crowd-sourcing and an expert annotated gold test set (n=191) for evaluation. The EBM-PICO annotation guidelines caution about variable annotation quality. Abaho et al. developed a framework to post-hoc correct EBM-PICO outcomes annotation inconsistencies. Lee et al. studied annotation span disagreements suggesting variability across the annotators. Low annotation quality in the training dataset is excusable, but the errors in the test set can lead to faulty evaluation of the downstream ML methods. We evaluate 1% of the EBM-PICO training set tokens to gauge the possible reasons for the fine-grained labelling errors and use this exercise to conduct an error-focused PICO re-annotation for the EBM-PICO gold test set. The file 'test_ebm_correctedlabels.tsv' has error corrected EBM-PICO gold test set. This dataset could be used as a complementary evalution set along with EBM-PICO test set.

    error_analysis.zip: This .zip file contains three .tsv files for each PICO class to identify possible errors in about 1% (about 12,962 tokens) of the EBM-PICO training set.

  11. e

    Trend-based calibration experiments with floodX data - Package - ERIC

    • opendata.eawag.ch
    Updated Nov 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Trend-based calibration experiments with floodX data - Package - ERIC [Dataset]. https://opendata.eawag.ch/dataset/sww-trend-based-calibration-experiments-with-floodx-data
    Explore at:
    Dataset updated
    Nov 6, 2020
    Description

    This archive contains the results from calibration experiments with trend-like data. A simple EPA SWMM flood model was calibrated with different combinations of conventional water level data and trend-like water level data. Three different measurement locations in the model were possible: the pond (s3), the outlet shaft (s5), and the basement (s6). Trend-like data of different qualities were considered: error-free data (codename "trend"), data with correlated errors ("sofi") and data with uncorrelated errors ("gaussiantrend").

  12. d

    Data from: Prediction error induced motor contagions in human behaviors

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsuyoshi Ikegami; Gowrishankar Ganesh; Tatsuya Takeuchi; Hiroki Nakamoto (2025). Prediction error induced motor contagions in human behaviors [Dataset]. http://doi.org/10.5061/dryad.3563k
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Tsuyoshi Ikegami; Gowrishankar Ganesh; Tatsuya Takeuchi; Hiroki Nakamoto
    Time period covered
    Jan 1, 2019
    Description

    Motor contagions refer to implicit effects on one's actions induced by observed actions. Motor contagions are believed to be induced simply by action observation and cause an observer's action to become similar to the action observed. In contrast, here we report a new motor contagion that is induced only when the observation is accompanied by prediction errors - differences between actions one observes and those he/she predicts or expects. In two experiments, one on whole-body baseball pitching and another on simple arm reaching, we show that the observation of the same action induces distinct motor contagions, depending on whether prediction errors are present or not. In the absence of prediction errors, as in previous reports, participants' actions changed to become similar to the observed action, while in the presence of prediction errors, their actions changed to diverge away from it, suggesting distinct effects of action observation and action prediction on human actions.

  13. n

    Data from: Pacman profiling: a simple procedure to identify stratigraphic...

    • data-staging.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jul 8, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Lazarus; Manuel Weinkauf; Patrick Diver (2011). Pacman profiling: a simple procedure to identify stratigraphic outliers in high-density deep-sea microfossil data [Dataset]. http://doi.org/10.5061/dryad.2m7b0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2011
    Authors
    David Lazarus; Manuel Weinkauf; Patrick Diver
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Marine, Global
    Description

    The deep-sea microfossil record is characterized by an extraordinarily high density and abundance of fossil specimens, and by a very high degree of spatial and temporal continuity of sedimentation. This record provides a unique opportunity to study evolution at the species level for entire clades of organisms. Compilations of deep-sea microfossil species occurrences are, however, affected by reworking of material, age model errors, and taxonomic uncertainties, all of which combine to displace a small fraction of the recorded occurrence data both forward and backwards in time, extending total stratigraphic ranges for taxa. These data outliers introduce substantial errors into both biostratigraphic and evolutionary analyses of species occurrences over time. We propose a simple method—Pacman—to identify and remove outliers from such data, and to identify problematic samples or sections from which the outlier data have derived. The method consists of, for a large group of species, compiling species occurrences by time and marking as outliers calibrated fractions of the youngest and oldest occurrence data for each species. A subset of biostratigraphic marker species whose ranges have been previously documented is used to calibrate the fraction of occurrences to mark as outliers. These outlier occurrences are compiled for samples, and profiles of outlier frequency are made from the sections used to compile the data; the profiles can then identify samples and sections with problematic data caused, for example, by taxonomic errors, incorrect age models, or reworking of sediment. These samples/sections can then be targeted for re-study.

  14. VNL 2025 Player Data

    • kaggle.com
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joshua Li (2025). VNL 2025 Player Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/12403262
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 7, 2025
    Dataset provided by
    Kaggle
    Authors
    Joshua Li
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a compilation of data collected from the official VNL website (link can be found here).

    The data on Volleyball World was too separated and unusable, with them categorizing data by Attackers, Blockers, Setters, etc. This makes the data inflexible and hard to use for statistical purposes. I manually copy and pasted the data into an Excel sheet, where I used some functions to clean and organize the data. Some columns found on the official website (like efficiency or success rate) were dropped to keep the dataset simple and generalizable.

    Please see column descriptions below: - Name: Name of Player - Team: First three letters of the team they represent - Attack Points: Points scored off spikes and tips - Attack Errors: Points lost on spikes or tips - Attack Attempts: Includes Attack Points, Attack Errors, and spikes/tips that did not lead to points for either team - Block Points: Points scored off of blocks - Block Errors: Points lost from blocks - Rebounds: Blocks that did not lead to points for either team - Serve Points: Services aces directly led to a point - Serve Errors: Points lost directly from serves - Serve Attempts: Serves that did not directly lead to points for either team - Successful Sets: Sets that led to a successful attack - Set Errors: Points lost directly from a set - Set Attempts: Sets that did not directly lead to a point for either team - Spike Digs: Number of tips or spikes that a player dug - Dig Errors: An attempt to dig a tip or spike that lost the defending team a point - Successful Receives: A near-perfect or perfect receive, resulting in an easy-to-set ball for the setter - Receive Errors: An attempt at a serve receive that lost the defending team a point - Receive Attempts: A receive of a serve that got the ball up in a non-ideal spot

  15. d

    Data from: Mean squared logarithmic error in daily mean streamflow...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Mean squared logarithmic error in daily mean streamflow predictions at GAGES-II reference streamgages [Dataset]. https://catalog.data.gov/dataset/mean-squared-logarithmic-error-in-daily-mean-streamflow-predictions-at-gages-ii-reference-
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release contains daily mean squared logarithmic error (MSLE), as well as several decompositions of the MSLE, for three streamflow models: nearest-neighbor drainage area ratio (NNDAR), a simple statistical model that re-scales streamflow data from the nearest streamgage; the National Hydrologic Model Infrastructure application of the Precipitation-Runoff Modeling System (NHM-PRMS); and version 2.0 of the National Water Model (NWM). Error was determined by evaluating each model daily against streamflow observations from 1,021 ‘reference’ (minimally anthropogenically impacted [Falcone, 2011]) watersheds across the conterminous United States with at least 10 years of observations. References: Falcone, J.A., 2011, GAGES-II: geospatial attributes of gages for evaluating streamflow: U.S. Geological Survey data release, https://doi.org/10.3133/70046617.

  16. w

    Demographic and Health Survey 1994 - Indonesia

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Bureau of Statistics (BPS) (2017). Demographic and Health Survey 1994 - Indonesia [Dataset]. https://microdata.worldbank.org/index.php/catalog/1400
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Central Bureau of Statistics (BPS)
    State Ministry of Population/National Family Planning Coordinating Board (NFPCB)
    Ministry of Health
    Time period covered
    1994
    Area covered
    Indonesia
    Description

    Abstract

    The 1994 Indonesia Demographic and Health Survey (IDHS) is a follow-on project to the 1987 National Indonesia Contraceptive Prevalence Survey (NICPS) and to the 1991 IDHS. The 1994 IDHS was significantly expanded from prior surveys to include two new modules in the women's questionnaire, namely maternal mortality and awareness of AIDS. The survey also investigated the availability of family planning and health services, which provides an opportunity for linking women's fertility, family planning and child health care with the availability of services. The 1994 IDHS also included a household expenditure module, which provides a means of identifying the household's economic status.

    The 1994 IDHS was specifically designed to meet the following objectives: - Provide data concerning fertility, family planning, maternal and child health, maternal mortality and awareness of AIDS that can be used by program managers, policymakers, and researchers to evaluate and improve existing programs; - Provide data about availability of family planning and health services, thereby offering an opportunity for linking women's fertility, family planning and child-care behavior with the availability of services; - Provide data on household expenditures, which can be used to identify the household's economic status; - Provide data that can be used to analyze trends over time by examining many of the same fertility, mortality and health issues that were addressed in the earlier surveys (1987 NICPS and 1991 IDHS); - Measure changes in fertility and contraceptive prevalence rates and at the same time study factors that affect the changes, such as marriage patterns, urban/rural residence, education, breastfeeding habits, and the availability of contraception; - Measure the development and achievements of programs related to health policy, particularly those concerning the maternal and child health development program implemented through public health clinics in Indonesia.

    Geographic coverage

    National

    Analysis unit

    • Household
    • Children under five years
    • Women age 15-49
    • Men

    Kind of data

    Sample survey data

    Sampling procedure

    Indonesia is divided into 27 provinces. For the implementation of its family planning program, the National Family Planning Coordinating Board (BKKBN) has divided these provinces into three regions as follows:

    • Java-Bali: DKI Jakarta, West Java, Central Java, DI Yogyakarta, East Java, and Bali
    • Outer Java-Bali I: Dista Aceh, North Sumatra, West Sumatra, South Sumatra, Lampung, West Nusa Tenggara, West Kalimantan, South Kalimantan, North Sulawesi, and South Sulawesi
    • Outer Java-Bali II: Riau, Jambi, Bengkulu, East Nusa Tenggara, East Timer, Central Kalimantan, East Kalimantan, Central Sulawesi, Southeast Sulawesi, Maluku, and Irian Jaya

    The 1990 Population Census of Indonesia shows that Java-Bali accounts for 62 percent of the national population, Outer Java-Bali I accounts for 27 percent, and Outer Java-Bali II accounts for 11 percent. The sample for the 1994 IDHS was designed to produce reliable estimates of fertility, contraceptive prevalence and other important variables for each of the provinces and for urban and rural areas of the three regions.

    In order to meet this objective, between 1,650 and 2,050 households were selected in each of the provinces in Java-Bali, 1,250 to 1,500 households in the ten provinces in Outer Java-Bali I, and 1,000 to 1,250 households in each of the provinces in Outer Java-Bali II, for a total of 35,500 households. With an average of 0.8 ever-married women 15-49 per household, the sample was expected to yield approximately 28,000 women eligible for the individual interview.

    Note: See detailed description of sample design in APPENDIX A of the survey report.

    Mode of data collection

    Face-to-face

    Research instrument

    The 1994 IDHS used four questionnaires--three at the household level and one at the community level. The three questionnaires administered at the household level are the household questionnaire, an individual questionnaire for women, and the household expenditure questionnaire. The household and individual questionnaires were based on the DHS Model "A" Questionnaire, which is designed for use in countries with high contraceptive prevalence. A deviation from the standard DHS practice is the exclusion of the anthropometric measurement of young children and their mothers. Topics covered in the 1994 IDHS that were not included in the 1991 IDHS are knowledge of AIDS and maternal mortality. Additions and modifications to the model questionnaire were made in order to provide detailed information specific to Indonesia. Except for the household expenditure module, the questionnaires were developed mainly in English and were translated into Babasa Indonesia. The household expenditure schedule was adapted from the core Susenas questionnaire model. Susenas is a national household survey carried out annually by BPS to collect data on various demographic and socioeconomic indicators of the population.

    Cleaning operations

    The first stage of data editing was carried out by the field editors who checked the completed questionnaires for thoroughness and accuracy. Field supervisors then further examined the questionnaires. In many instances, the teams sent the questionnaires to CBS through the regency/municipality statistics offices. In these cases, no checking was done by the PSO. At CBS, the questionnaires underwent another round of editing, primarily for completeness and coding of responses to open-ended questions.

    The data were processed using 16 microcomputers and the DHS computer program, ISSA (Integrated System for Survey Analysis). Data entry and office editing were initiated immediately after fieldwork began. Simple range and skip errors were corrected at the data entry stage. Data processing was completed by November 1994, and the preliminary report of the survey was published in January 1995.

    Response rate

    A total of 35,510 households were selected for the survey, of which 34,060 were found. Of the encountered households, 33,738 (99.1 percent) were successfully interviewed. In these households, 28,800 eligible women were identified and complete interviews were obtained from 28,168 women, or 97.8 percent of all eligible women. Generally high response rates for both household and individual interviews were due mainly to the strict enforcement of the role to revisit the originally selected household if no one was at home initially. No substitution for the originally selected households was allowed. Interviewers were instructed to make at least three visits in an effort to contact the household or eligible woman.

    Note: See summarized response rates by place of residence in Table 1.2 of the survey report.

    Sampling error estimates

    The estimates from a sample survey are affected by two types of errors: (1) non-sampling errors, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during implementation of the 1994 IDHS to minimize this type of error, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 1994 IDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

    Sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

    If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 1994 IDHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the 1994 IDHS is the ISSA Sampling Error Module. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jacknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

    Note: See detailed estimate of sampling error calculation in APPENDIX B of the survey report.

    Data appraisal

    Data Quality Tables - Household age distribution - Age distribution of eligible and interviewed women - Completeness of reporting - Births by calendar years - Reporting of age at death in days - Reporting of age at death in months

    Note: See detailed tables in APPENDIX C of the report which is presented in this documentation.

  17. Mean squared error and |Bias| (in brackets) values for different estimators...

    • figshare.com
    xls
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed (2023). Mean squared error and |Bias| (in brackets) values for different estimators for Population IV without measurement error. [Dataset]. http://doi.org/10.1371/journal.pone.0261561.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean squared error and |Bias| (in brackets) values for different estimators for Population IV without measurement error.

  18. Mean squared error and |Bias| (in brackets) values for different estimators...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed (2023). Mean squared error and |Bias| (in brackets) values for different estimators for Population III with measurement error. [Dataset]. http://doi.org/10.1371/journal.pone.0261561.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean squared error and |Bias| (in brackets) values for different estimators for Population III with measurement error.

  19. Mean squared error and |Bias| (in brackets) values for different estimators...

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed (2023). Mean squared error and |Bias| (in brackets) values for different estimators for Population II with measurement error. [Dataset]. http://doi.org/10.1371/journal.pone.0261561.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean squared error and |Bias| (in brackets) values for different estimators for Population II with measurement error.

  20. f

    Mean squared error and |Bias| (in brackets) values for different estimators...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed (2023). Mean squared error and |Bias| (in brackets) values for different estimators for Population V without measurement error. [Dataset]. http://doi.org/10.1371/journal.pone.0261561.t012
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Erum Zahid; Javid Shabbir; Sat Gupta; Ronald Onyango; Sadia Saeed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean squared error and |Bias| (in brackets) values for different estimators for Population V without measurement error.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stefan (2023). Dataset for: "Quantification of Geometric Errors Made Simple: Application to Main-Group Molecular Structures" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7566470

Dataset for: "Quantification of Geometric Errors Made Simple: Application to Main-Group Molecular Structures"

Explore at:
Dataset updated
Jan 25, 2023
Dataset provided by
Vuckovic
Authors
Stefan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains geometric energy offset (GEO') values for a set of density functional theory (DFT) methods for the B2se set of molecular structures. The data was generated as part of a research project aimed at quantifying geometric errors in main-group molecular structures. The dataset is in XLSX format created with MS Excel (version 16.69), and contains multiple worksheets with GEO' values for different basis sets and DFT methods. The worksheet headings, such as "AVQZ AVTZ AVDZ VQZ VTZ VDZ" represent different basis sets of Dunning theory, and the naming convention "(A)VnZ = aug-cc-pVnZ" is being used to label the worksheets. The data is organized in columns, with the first column providing the molecular ID and the names of the DFT methods specified in the first row of each worksheet. The molecular structures corresponding to each of these IDs can be found in Figure S1 of the supplementary information of the underlying publication [https://pubs.acs.org/doi/suppl/10.1021/acs.jpca.1c10688/suppl_file/jp1c10688_si_001.pdf]. The data have been generated from quantum-chemical calculations from the G16 and ORCA 5.0.0 packages, with further computational details, methodology, and data validation strategies (e.g., comparisons with higher-level quantum-chemical calculations) given in the supplementary information of the underlying publication [J. Phys. Chem. A 2022, 126, 7, 1300–1311] and its supporting information [https://pubs.acs.org/doi/suppl/10.1021/acs.jpca.1c10688/suppl_file/jp1c10688_si_001.pdf]. The dataset is expected to be useful to researchers in the field of computational chemistry and materials science. All values are given in kcal/mol. The data is generated by the authors of the underlying publication and it is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The data is expected to be re-usable and the quality of the data is assured by the authors. The size of the data is 71 KB.

Search
Clear search
Close search
Google apps
Main menu