100+ datasets found
  1. f

    Training, test data and model parameters.

    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund (2023). Training, test data and model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0077302.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Training, test data and model parameters. The last 3 columns show the MinORG, LT and HT parameters used to create the pathogenicity families and build the model for each of the 10 models. Zthr is a threshold value, calculated for each model at the cross validation phase, which is used, given the final prediction score, to decide if the input organisms will be predicted as pathogenic or non-pathogenic. The parameters for each model are chosen after 5-fold cross-validation tests.

  2. Dog vs Cat

    • kaggle.com
    Updated Mar 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamed Etezadi (2022). Dog vs Cat [Dataset]. https://www.kaggle.com/datasets/hamedetezadi/dog-vs-cat/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hamed Etezadi
    Description

    Dataset

    This dataset was created by Hamed Etezadi

    Contents

  3. GRD-TRT-BUF-4I: Technical Validation Data

    • figshare.com
    application/csv
    Updated Mar 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Kunz; H. Oliver Gao (2024). GRD-TRT-BUF-4I: Technical Validation Data [Dataset]. http://doi.org/10.6084/m9.figshare.25224224.v5
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    figshare
    Authors
    Nicholas Kunz; H. Oliver Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the static test data from the study "Global Geolocated Realtime Data of Interfleet Urban Transit Bus Iding" collected by GRD-TRT-BUF-4I. Updated versions are available here.test-data-a.csv was collected from December 31, 2023 00:01:30 UTC to January 1, 2024 00:01:30 UTC.test-data-b.csv was collected from January 4, 2024 01:30:30 UTC to January 5, 2024 01:30:30 UTC.test-data-c.csv was collected from January 10, 2024 16:05:30 UTC to January 11, 2024 16:05:30 UTC.test-data-d.csv was collected from January 15, 2024 22:30:21 UTC to January 16, 2024 22:30:17 UTC.test-data-e.csv was collected from February 16, 2024 22:30:21 UTC to February 17, 2024 22:30:20 UTC.test-data-f.csv was collected from February 21, 2024 22:30:21 UTC to February 22, 2024 22:30:20 UTC.

  4. Training, Validation and Test data for "On the accuracy of posterior...

    • zenodo.org
    txt
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harry Bevins; Harry Bevins (2025). Training, Validation and Test data for "On the accuracy of posterior recovery with neural network emulators" [Dataset]. http://doi.org/10.5281/zenodo.15040279
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Harry Bevins; Harry Bevins
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The repo includes the training, test and validation data used in the paper "On the accuracy of posterior recovery with neural network emulators". Note that due to the convention employed by the emulator framework in the paper the test data is the data used for early stopping and the validation data is used to measure the accuracy of the emulator after training. This is the opposite convention to most machine learning literature.

    The corresponding code used in the paper is found at: https://github.com/htjb/validating_posteriors.

    `_data.txt` corresponds to the ARES parameters used to generate the signals in `_labels.txt`.

  5. d

    Structure tensor validation - Dataset - data.govt.nz - discover and use data...

    • catalogue.data.govt.nz
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). Structure tensor validation - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-25216145
    Explore at:
    Dataset updated
    Feb 1, 2001
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structure tensor validationGeneral informationThis item contains test data to validate the structure tensor algorithms and a supplemental paper describing how the data was generated and used.ContentsThe test_data.zip archive contains 101 slices of a cylinder (701x701 pixels) with two artificially created fibre orientations. The outer fibres are oriented longitudinally, and the inner fibres are oriented circumferentially, similar to the ones found in the rat uterus.The SupplementaryMaterials_rat_uterus_texture_validation.pdf file is a short supplemental paper describing the generation of the test data and the results after being processed with the structure tensor code.

  6. Z

    Data pipeline Validation And Load Testing using Multiple JSON Files

    • data.niaid.nih.gov
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afsana Khan (2021). Data pipeline Validation And Load Testing using Multiple JSON Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636789
    Explore at:
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    Mainak Adhikari
    Pelle Jakovits
    Afsana Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset contains temperature and humidity sensor readings of a particular day, which are synthetically generated using a data generator and are stored as JSON files to validate and test (performance/load testing) the data pipeline components.

  7. GeneLab amplicon validation test set

    • figshare.com
    application/x-gzip
    Updated Jul 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Lee (2022). GeneLab amplicon validation test set [Dataset]. http://doi.org/10.6084/m9.figshare.20294529.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jul 13, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Michael Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GeneLab test metagenomics output for testing validation programs

  8. n

    Train, validation, test data sets and confusion matrices underlying...

    • 4tu.edu.hpc.n-helix.com
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen (2023). Train, validation, test data sets and confusion matrices underlying publication: "Automated cell counting for Trypan blue stained cell cultures using machine learning" [Dataset]. http://doi.org/10.4121/21695819.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 7, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Annotated test and train data sets. Both images and annotations are provided separately.


    Validation data set for Hi5, Sf9 and HEK cells.


    Confusion matrices for the determination of performance parameters

  9. Gender Classification from an image

    • kaggle.com
    zip
    Updated Jul 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerry (2021). Gender Classification from an image [Dataset]. https://www.kaggle.com/gpiosenka/gender-classification-from-an-image
    Explore at:
    zip(180858795 bytes)Available download formats
    Dataset updated
    Jul 6, 2021
    Authors
    Gerry
    Description

    Context

    I do a lot of work with image data sets. Often it is necessary to partition the images into male and female data sets. Doing this by hand can be a long and tedious task particularly on large data sets. So I decided to create a classifier that could do the task for me.

    Content

    I used the CELEBA aligned data set to provide the images. I went through and separated the images visually into 1747 female and 1747 male training images. I also created 100 male and 100 female test image and 100 male, 100 female validation images. I want to only the face to be in the image so I developed an image cropping function using MTCNN to crop all the images. That function is included as one of the notebooks should anyone have a need for a good face cropping function. I also created an image duplicate detector to try to eliminate any of the training images from appearing in the test or validation images. I have developed a general purpose image classification function that works very well for most image classification tasks. It contains the option to select 1 of 7 models for use. For this application I used the MobileNet model because it is less computationally expensive and gives excellent results. On the test set accuracy is near 100%.

    Acknowledgements

    The CELEBA aligned data set was used. This data set is very large and of good quality. To crop the images to only include the face I developed a face cropping function using MTCNN. MTCNN is a very accurate program and is reasonably fast, however it is notflawless so after cropping the iages you shouldalways visually inspect the results.

    Inspiration

    I developed this data set to train a classifier to be able to distinguish the gender shown in an image. Why bother you may ask I can just look at the image and tell. True but lets say you have a data set of 50,000 images that you want to separate it into male and female data sets. Doing that by hand would take forever. With the trained classifier with near 100% accuracy you can use the classifier with model.predict to do the job for you.

  10. d

    Data from: Development of a Mobile Robot Test Platform and Methods for...

    • catalog.data.gov
    • data.nasa.gov
    • +1more
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2023). Development of a Mobile Robot Test Platform and Methods for Validation of Prognostics-Enabled Decision Making Algorithms [Dataset]. https://catalog.data.gov/dataset/development-of-a-mobile-robot-test-platform-and-methods-for-validation-of-prognostics-enab
    Explore at:
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Dashlink
    Description

    As fault diagnosis and prognosis systems in aerospace applications become more capable, the ability to utilize information supplied by them becomes increasingly important. While certain types of vehicle health data can be effectively processed and acted upon by crew or support personnel, others, due to their complexity or time constraints, require either automated or semi-automated reasoning. Prognostics-enabled Decision Making (PDM) is an emerging research area that aims to integrate prognostic health information and knowledge about the future operating conditions into the process of selecting subsequent actions for the system. The newly developed PDM algorithms require suitable software and hardware platforms for testing under realistic fault scenarios. The paper describes the development of such a platform, based on the K11 planetary rover prototype. A variety of injectable fault modes are being investigated for electrical, mechanical, and power subsystems of the testbed, along with methods for data collection and processing. In addition to the hardware platform, a software simulator with matching capabilities has been developed. The simulator allows for prototyping and initial validation of the algorithms prior to their deployment on the K11. The simulator is also available to the PDM algorithms to assist with the reasoning process. A reference set of diagnostic, prognostic, and decision making algorithms is also described, followed by an overview of the current test scenarios and the results of their execution on the simulator.

  11. Z

    Data from: Benchmark Datasets Incorporating Diverse Tasks, Sample Sizes,...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benchmark Datasets Incorporating Diverse Tasks, Sample Sizes, Material Systems, and Data Heterogeneity for Materials Informatics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4903957
    Explore at:
    Dataset updated
    Jun 6, 2021
    Dataset provided by
    Kauwe, K. Steven
    Sparks, D. Taylor
    Henderson, N. Ashley
    Description

    This benchmark data is comprised of 50 different datasets for materials properties obtained from 16 previous publications. The data contains both experimental and computational data, data suited for regression as well as classification, sizes ranging from 12 to 6354 samples, and materials systems spanning the diversity of materials research. In addition to cleaning the data where necessary, each dataset was split into train, validation, and test splits.

    For datasets with more than 100 values, train-val-test splits were created, either with a 5-fold or 10-fold cross-validation method, depending on what each respective paper did in their studies. Datasets with less than 100 values had train-test splits created using the Leave-One-Out cross-validation method.

    For further information, as well as directions on how to access the data, please go to the corresponding GitHub repository: https://github.com/anhender/mse_ML_datasets/tree/v1.0

  12. NWEI Azura RTI 1/20th Model Validation Wave Tank Test Data

    • osti.gov
    • mhkdr.openei.org
    • +3more
    Updated May 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Wind and Water Technologies Office (EE-4W) (2017). NWEI Azura RTI 1/20th Model Validation Wave Tank Test Data [Dataset]. http://doi.org/10.15473/1513773
    Explore at:
    Dataset updated
    May 8, 2017
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Marine and Hydrokinetic Data Repository (MHKDR)
    Northwest Energy Innovations
    Description

    Data from the 1/20th wave tank test of the RTI model. NWEI has licensed intellectual property from RTI, and modified the PTO and retested the 1/20th RTI model that was tested as part of the Wave Energy Prize. The goal of the test was to validate NWEI's simulation models of the model. The test occurred at the University of Maine in Orono.

  13. f

    Comparison of classifier performance across two data sets.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant (2023). Comparison of classifier performance across two data sets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000093.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Joseph Schlecht; Matthew E. Kaplan; Kobus Barnard; Tatiana Karafet; Michael F. Hammer; Nirav C. Merchant
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The top table shows the average classifier performance for cross-validation on the 9-locus public STR data. The bottom table is the performance for the same test, but on a 9-locus subset of our ground-truth training data. While overall performance is lower than the 15-locus cross-validation test on our ground-truth data (Table 1), the two data sets perform similarly here, indicating that increasing the number of markers in the data set can significantly improve performance.

  14. t

    Training and validation dataset 2 of milling processes for time series...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Training and validation dataset 2 of milling processes for time series prediction - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1738
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Abstract: Das Ziel des Datensatzes ist das Training und die Validierung von Modellen zur Vorhersage von Zeitreihen für Fräsprozesse. Dazu wurden an einer DMC 60H Prozesse mit einer Abtastrate von 500 Hz durch eine Siemens Industrial Edge aufgenommen. Die Maschine wurde steuerungstechnisch aufgerüstet. Es wurden Prozesse für das Modelltraining und die Validierung aufgenommen, welche sowohl für die Bearbeitung von Stahl sowie von Aluminium verwendet wurden. Es wurden mehrere Aufnahmen mit und ohne Werkstück (Aircut) erstellt, um möglichst viele Fälle abdecken zu können. Es handelt sich um die gleiche Versuchsreihe wie in "Training and validation dataset of milling processes for time series prediction" mit der DOI 10.5445/IR/1000157789 und hat zum Ziel, eine Untersuchung der Übertragbarkeit von Modellen zwischen verschiedenen Maschinen zu ermöglichen. Abstract: The aim of the dataset is to train and validate models for predicting time series for milling processes. For this purpose, processes were recorded at a sampling rate of 500 Hz by a Siemens Industrial Edge on a DMC 60H. The machine was upgraded in terms of control technology. Processes for model training and validation were recorded, suitable for both steel and aluminum machining. Several recordings were made with and without the workpiece (aircut) in order to cover as many cases as possible. This is the same series of experiments as in "Training and validation dataset of milling processes for time series prediction" with DOI 10.5445/IR/1000157789 and allows an investigation of the transferability of models between different machines. TechnicalRemarks: Documents: -Design of Experiments: Information on the paths such as the technological values of the experiments -Recording information: Information about the recordings with comments -Data: All recorded datasets. The first level contains the folders for training and validation both with and without the workpiece. In the next level, the individual test executions are located. The individual recordings are stored in the form of a JSON file. This consists of a header with all relevant information such as the signal sources. This is followed by the entries of the recorded time series. -NC-Code: NC programs executed on the machine Experimental data: -Machine: Retrofitted DMC 60H -Material: S235JR, 2007 T4 -Tools: -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 5mm -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 10mm -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 20mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 5mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 10mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 5mm -Workpiece blank dimensions: 150x75x50mm License: This work is licensed under a Creative Commons Attribution 4.0 International License. Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).

  15. ISIC2018-Task3-preprocessed data

    • kaggle.com
    zip
    Updated Jun 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teddy_55 (2021). ISIC2018-Task3-preprocessed data [Dataset]. https://www.kaggle.com/datasets/teddyziyyyu/isic2018task3preprocessed-data/code
    Explore at:
    zip(2771717873 bytes)Available download formats
    Dataset updated
    Jun 8, 2021
    Authors
    Teddy_55
    Description

    Dataset

    This dataset was created by Teddy_55

    Contents

  16. d

    Dead Tree Detection Validation Data from Sequoia and Kings Canyon national...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Dead Tree Detection Validation Data from Sequoia and Kings Canyon national parks [Dataset]. https://catalog.data.gov/dataset/dead-tree-detection-validation-data-from-sequoia-and-kings-canyon-national-parks
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    Most of these data were collected in order to create a database of tree locations for use in calibrating remote sensing tools and products, particularly dead tree detection tools and canopy species maps. Data include tree locations, species identification, and status (live, dead, and, if dead, sometimes includes information on foliage and twig retention). They are a collection of different sampling efforts performed over several years, starting in a period of severe drought mortality. One csv table is included that shows data and validation results for an additional dataset that was used to test the NAIP derived dead tree detection model that is associated with this data release. Locations are not included for that dataset.

  17. d

    Training dataset for NABat Machine Learning V1.0

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Training dataset for NABat Machine Learning V1.0 [Dataset]. https://catalog.data.gov/dataset/training-dataset-for-nabat-machine-learning-v1-0
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.

  18. Z

    Data pipeline Validation And Load Testing using Multiple CSV Files

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afsana Khan (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797
    Explore at:
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    Mainak Adhikari
    Pelle Jakovits
    Afsana Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.

  19. Continuous norming of psychometric tests: A simulation study of parametric...

    • plos.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Lenhard; Wolfgang Lenhard; Sebastian Gary (2023). Continuous norming of psychometric tests: A simulation study of parametric and semi-parametric approaches [Dataset]. http://doi.org/10.1371/journal.pone.0222279
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexandra Lenhard; Wolfgang Lenhard; Sebastian Gary
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Continuous norming methods have seldom been subjected to scientific review. In this simulation study, we compared parametric with semi-parametric continuous norming methods in psychometric tests by constructing a fictitious population model within which a latent ability increases with age across seven age groups. We drew samples of different sizes (n = 50, 75, 100, 150, 250, 500 and 1,000 per age group) and simulated the results of an easy, medium, and difficult test scale based on Item Response Theory (IRT). We subjected the resulting data to different continuous norming methods and compared the data fit under the different test conditions with a representative cross-validation dataset of n = 10,000 per age group. The most significant differences were found in suboptimal (i.e., too easy or too difficult) test scales and in ability levels that were far from the population mean. We discuss the results with regard to the selection of the appropriate modeling techniques in psychometric test construction, the required sample sizes, and the requirement to report appropriate quantitative and qualitative test quality criteria for continuous norming methods in test manuals.

  20. c

    Bunting Vehicle Test 02 - Radar Validation

    • datacommons.cyverse.org
    • ckan.cyverse.rocks
    Updated 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Bunting (2022). Bunting Vehicle Test 02 - Radar Validation [Dataset]. http://doi.org/10.25739/w0tg-e838
    Explore at:
    Dataset updated
    2022
    Dataset provided by
    CyVerse Data Commons
    Authors
    Matt Bunting
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    CAN and GPS data from a controlled platoon experiment to verify radar data

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund (2023). Training, test data and model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0077302.t001

Training, test data and model parameters.

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Training, test data and model parameters. The last 3 columns show the MinORG, LT and HT parameters used to create the pathogenicity families and build the model for each of the 10 models. Zthr is a threshold value, calculated for each model at the cross validation phase, which is used, given the final prediction score, to decide if the input organisms will be predicted as pathogenic or non-pathogenic. The parameters for each model are chosen after 5-fold cross-validation tests.

Search
Clear search
Close search
Google apps
Main menu