24 datasets found
  1. Python functions -- cross-validation methods from a data-driven perspective

    • zenodo.org
    bin, txt, zip
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanwen Wang; Yanwen Wang (2024). Python functions -- cross-validation methods from a data-driven perspective [Dataset]. http://doi.org/10.5281/zenodo.12804224
    Explore at:
    bin, zip, txtAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yanwen Wang; Yanwen Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 28, 2024
    Description

    This is the organized python functions of proposed methods in Yanwen Wang PhD research. Researchers can directly use these functions to conduct spatial+ cross-validation (SP-CV), adversarial-validation-based dissimilarity quantification method (AVD), and dissimilarity-adaptive cross-validation (DA-CV). The description of how to run codes is in Readme.txt. The descriptions of functions are in functions.docx.

  2. P

    AdversarialQA Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Bartolo; Alastair Roberts; Johannes Welbl; Sebastian Riedel; Pontus Stenetorp (2024). AdversarialQA Dataset [Dataset]. https://paperswithcode.com/dataset/adversarialqa
    Explore at:
    Dataset updated
    Jan 9, 2024
    Authors
    Max Bartolo; Alastair Roberts; Johannes Welbl; Sebastian Riedel; Pontus Stenetorp
    Description

    We have created three new Reading Comprehension datasets constructed using an adversarial model-in-the-loop.

    We use three different models; BiDAF (Seo et al., 2016), BERTLarge (Devlin et al., 2018), and RoBERTaLarge (Liu et al., 2019) in the annotation loop and construct three datasets; D(BiDAF), D(BERT), and D(RoBERTa), each with 10,000 training examples, 1,000 validation, and 1,000 test examples.

    The adversarial human annotation paradigm ensures that these datasets consist of questions that current state-of-the-art models (at least the ones used as adversaries in the annotation loop) find challenging. The three AdversarialQA round 1 datasets provide a training and evaluation resource for such methods.

  3. Z

    TCAB: Text Classification Attack Benchmark Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 21, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brophy, Jonathan (2022). TCAB: Text Classification Attack Benchmark Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6615385
    Explore at:
    Dataset updated
    Oct 21, 2022
    Dataset provided by
    Noack, Adam
    You, Wencong
    Xie, Zhouhang
    Singh, Sameer
    Lowd, Daniel
    Brophy, Jonathan
    Asthana, Kalyani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TCAB is a large collection of successful adversarial attacks on state-of-the-art text classification models trained on multiple sentiment and abuse domain datasets.

    The dataset is broken up into 2 files: train.csv and val.csv. The training set contains 1,448,751 instances (552,364 are "clean" unperturbed instances) and the validation set contains 482,914 instances (178,607 are "clean"). Each instance contains the following attributes:

    scenario: Domain, either abuse or sentiment.

    target_model_dataset: Dataset being attacked.

    target_model_train_dataset: Dataset the target model trained on.

    target_model: Type of victim model (e.g., bert, roberta, xlnet).

    attack_toolchain: Open-source attack toolchain, either TextAttack or OpenAttack.

    attack_name: Name of the attack method.

    original_text: Original input text.

    original_output: Prediction probabilities of the target model on the original text.

    ground_truth: Encoded label for the original task of the domain dataset. 1 and 0 means toxic and toxic for abuse datasets, respectively. 1 and 0 means positive and negative sentiment for sentiment datasets. If there is a neutral sentiment, then 2, 1, 0 means positive, neutral, and negative sentiment.

    status: Unperturbed example if "clean"; successful adversarial attack if "success".

    perturbed_text: Text after it has been perturbed by an attack.

    perturbed_output: Prediction probabilities of the target model on the perturbed text.

    attack_time: Time taken to execute the attack.

    num_queries: Number of queries performed while attacking.

    frac_words_changed: Fraction of words changed due to an attack.

    test_index: Index of each unique source example (original instance) (LEGACY - necessary for backwards compatibility).

    original_text_identifier: Index of each unique source example (original instance).

    unique_src_instance_identifier: Primary key to uniquely identify to every source instance; comprised of (target_model_dataset, test_index, original_text_identifier).

    pk: Primary key to uniquely identify every attack instance; comprised of (attack_name, attack_toolchain, original_text_identifier, scenario, target_model, target_model_dataset, test_index).

  4. f

    Pearson correlation coefficient values of Davies–Bouldin Metric (DBM) and...

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shashank Kotyan; Moe Matsuki; Danilo Vasconcellos Vargas (2023). Pearson correlation coefficient values of Davies–Bouldin Metric (DBM) and Amalgam Metric (AM) with Mean L2 Score of adversarial attacks for each vanilla classifier and attack pair. [Dataset]. http://doi.org/10.1371/journal.pone.0266060.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Shashank Kotyan; Moe Matsuki; Danilo Vasconcellos Vargas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pearson correlation coefficient values of Davies–Bouldin Metric (DBM) and Amalgam Metric (AM) with Mean L2 Score of adversarial attacks for each vanilla classifier and attack pair.

  5. f

    Average training times of DSAN-MMD on CSI image dataset across all training...

    • figshare.com
    xls
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman (2024). Average training times of DSAN-MMD on CSI image dataset across all training percentages. [Dataset]. http://doi.org/10.1371/journal.pone.0298888.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Average training times of DSAN-MMD on CSI image dataset across all training percentages.

  6. m

    Batik Nitik 960

    • data.mendeley.com
    Updated Jan 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agus Eko Minarno (2023). Batik Nitik 960 [Dataset]. http://doi.org/10.17632/sgh484jxzy.3
    Explore at:
    Dataset updated
    Jan 23, 2023
    Authors
    Agus Eko Minarno
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Batik is one of the Indonesian cultural heritages which have noble cultural and philosophical meaning in every motif. This article introduces a dataset from Yogyakarta, Indonesia, "Batik Nitik 960". Dataset was captured from a piece of fabric consisting of 60 Nitik motifs. The dataset provided by Paguyuban Pecinta Batik Indonesia (PPBI) Sekar Jagad Yogyakarta, collection of Winotosasto Batik, and the data taken in APIPS Gallery. The dataset is divided into 60 categories with a total of 960 images, and each class has 16 photos. The images were captured by Sony Alpha a6400, lighting using Godox SK II 400, and data was filtered using jpg format. Each category has four motifs and is augmented using rotation using 90, 180, and 270 degrees. Each class has a philosophical meaning which describes the history of the motif. This dataset allows the training and validation of machine learning models to classify, retrieve, or generate a new batik motif using a generative adversarial network. To our knowledge, this is the first publicly available Batik Nitik dataset with philosophical meaning. Data provides by a collaboration of PPBI Sekar Jagad Yogyakarta, Universitas Muhammadiyah Malang, and Universitas Gadjah Mada. Hope this dataset "Batik Nitik 960" can support batik research and we are open to research collaboration.

  7. f

    Statistical results of the accuracy rate.

    • plos.figshare.com
    xlsx
    Updated Jan 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Statistical results of the accuracy rate. [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s010
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yansong Liu; Shuang Wang; He Sui; Li Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.

  8. h

    anli_r3_en

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anli_r3_en [Dataset]. https://huggingface.co/datasets/vietgpt/anli_r3_en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    VietGPT
    Description

    The Adversarial Natural Language Inference (ANLI)

    Source: https://huggingface.co/datasets/anli Num examples: 100,459 (train) 1,200 (validation) 1,200 (test)

    Language: English

    from datasets import load_dataset

    load_dataset("vietgpt/anli_r3_en")

    Format for NLI task

    def preprocess(sample): premise = sample['premise'] hypothesis = sample['hypothesis'] label = sample['label']

    if label == 0:
      label = "entailment"
    elif label == 1:
      label =… See the full description on the dataset page: https://huggingface.co/datasets/vietgpt/anli_r3_en.
    
  9. f

    Effect of confidence level on detection accuracy rate (HDFS).

    • plos.figshare.com
    bin
    Updated Jan 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Effect of confidence level on detection accuracy rate (HDFS). [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s005
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yansong Liu; Shuang Wang; He Sui; Li Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Effect of confidence level on detection accuracy rate (HDFS).

  10. f

    Average micro-F1 scores of DASAN-LMMD on CSI image dataset across all...

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman (2024). Average micro-F1 scores of DASAN-LMMD on CSI image dataset across all training percentages. [Dataset]. http://doi.org/10.1371/journal.pone.0298888.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Average micro-F1 scores of DASAN-LMMD on CSI image dataset across all training percentages.

  11. f

    Average macro-F1 scores of DASAN-MMD on CSI image dataset across all...

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman (2024). Average macro-F1 scores of DASAN-MMD on CSI image dataset across all training percentages. [Dataset]. http://doi.org/10.1371/journal.pone.0298888.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Average macro-F1 scores of DASAN-MMD on CSI image dataset across all training percentages.

  12. f

    Table_4_MADGAN:A microbe-disease association prediction model based on...

    • figshare.com
    xlsx
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weixin Hu; Xiaoyu Yang; Lei Wang; Xianyou Zhu (2023). Table_4_MADGAN:A microbe-disease association prediction model based on generative adversarial networks.XLSX [Dataset]. http://doi.org/10.3389/fmicb.2023.1159076.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Weixin Hu; Xiaoyu Yang; Lei Wang; Xianyou Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Researches have demonstrated that microorganisms are indispensable for the nutrition transportation, growth and development of human bodies, and disorder and imbalance of microbiota may lead to the occurrence of diseases. Therefore, it is crucial to study relationships between microbes and diseases. In this manuscript, we proposed a novel prediction model named MADGAN to infer potential microbe-disease associations by combining biological information of microbes and diseases with the generative adversarial networks. To our knowledge, it is the first attempt to use the generative adversarial network to complete this important task. In MADGAN, we firstly constructed different features for microbes and diseases based on multiple similarity metrics. And then, we further adopted graph convolution neural network (GCN) to derive different features for microbes and diseases automatically. Finally, we trained MADGAN to identify latent microbe-disease associations by games between the generation network and the decision network. Especially, in order to prevent over-smoothing during the model training process, we introduced the cross-level weight distribution structure to enhance the depth of the network based on the idea of residual network. Moreover, in order to validate the performance of MADGAN, we conducted comprehensive experiments and case studies based on databases of HMDAD and Disbiome respectively, and experimental results demonstrated that MADGAN not only achieved satisfactory prediction performances, but also outperformed existing state-of-the-art prediction models.

  13. f

    Accuracy rate under different noise rates on RBF and Waveform datasets.

    • plos.figshare.com
    bin
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Accuracy rate under different noise rates on RBF and Waveform datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s009
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yansong Liu; Shuang Wang; He Sui; Li Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accuracy rate under different noise rates on RBF and Waveform datasets.

  14. Accuracy rate with the number of instances processed.

    • plos.figshare.com
    bin
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Accuracy rate with the number of instances processed. [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yansong Liu; Shuang Wang; He Sui; Li Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accuracy rate with the number of instances processed.

  15. f

    Parameters used in training CNN models.

    • plos.figshare.com
    xls
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hazem Zein; Samer Chantaf; Régis Fournier; Amine Nait-Ali (2024). Parameters used in training CNN models. [Dataset]. http://doi.org/10.1371/journal.pone.0297958.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Hazem Zein; Samer Chantaf; Régis Fournier; Amine Nait-Ali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model Checkpoint was used to save the model on each iteration if validation accuracy improves.

  16. f

    Comparison of running time consumption of each algorithm (seconds).

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Comparison of running time consumption of each algorithm (seconds). [Dataset]. http://doi.org/10.1371/journal.pone.0292140.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yansong Liu; Shuang Wang; He Sui; Li Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of running time consumption of each algorithm (seconds).

  17. f

    Data_Sheet_1_Site effects how-to and when: An overview of retrospective...

    • frontiersin.figshare.com
    pdf
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johanna M. M. Bayer; Paul M. Thompson; Christopher R. K. Ching; Mengting Liu; Andrew Chen; Alana C. Panzenhagen; Neda Jahanshad; Andre Marquand; Lianne Schmaal; Philipp G. Sämann (2023). Data_Sheet_1_Site effects how-to and when: An overview of retrospective techniques to accommodate site effects in multi-site neuroimaging analyses.PDF [Dataset]. http://doi.org/10.3389/fneur.2022.923988.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Frontiers
    Authors
    Johanna M. M. Bayer; Paul M. Thompson; Christopher R. K. Ching; Mengting Liu; Andrew Chen; Alana C. Panzenhagen; Neda Jahanshad; Andre Marquand; Lianne Schmaal; Philipp G. Sämann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Site differences, or systematic differences in feature distributions across multiple data-acquisition sites, are a known source of heterogeneity that may adversely affect large-scale meta- and mega-analyses of independently collected neuroimaging data. They influence nearly all multi-site imaging modalities and biomarkers, and methods to compensate for them can improve reliability and generalizability in the analysis of genetics, omics, and clinical data. The origins of statistical site effects are complex and involve both technical differences (scanner vendor, head coil, acquisition parameters, imaging processing) and differences in sample characteristics (inclusion/exclusion criteria, sample size, ancestry) between sites. In an age of expanding international consortium research, there is a growing need to disentangle technical site effects from sample characteristics of interest. Numerous statistical and machine learning methods have been developed to control for, model, or attenuate site effects – yet to date, no comprehensive review has discussed the benefits and drawbacks of each for different use cases. Here, we provide an overview of the different existing statistical and machine learning methods developed to remove unwanted site effects from independently collected neuroimaging samples. We focus on linear mixed effect models, the ComBat technique and its variants, adjustments based on image quality metrics, normative modeling, and deep learning approaches such as generative adversarial networks. For each method, we outline the statistical foundation and summarize strengths and weaknesses, including their assumptions and conditions of use. We provide information on software availability and comment on the ease of use and the applicability of these methods to different types of data. We discuss validation and comparative reports, mention caveats and provide guidance on when to use each method, depending on context and specific research questions.

  18. f

    Contingency table of human image classification.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krish Suresh; Michael S. Cohen; Christopher J. Hartnick; Ryan A. Bartholomew; Daniel J. Lee; Matthew G. Crowson (2023). Contingency table of human image classification. [Dataset]. http://doi.org/10.1371/journal.pdig.0000202.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS Digital Health
    Authors
    Krish Suresh; Michael S. Cohen; Christopher J. Hartnick; Ryan A. Bartholomew; Daniel J. Lee; Matthew G. Crowson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic clinical images could augment real medical image datasets, a novel approach in otolaryngology–head and neck surgery (OHNS). Our objective was to develop a generative adversarial network (GAN) for tympanic membrane images and to validate the quality of synthetic images with human reviewers. Our model was developed using a state-of-the-art GAN architecture, StyleGAN2-ADA. The network was trained on intraoperative high-definition (HD) endoscopic images of tympanic membranes collected from pediatric patients undergoing myringotomy with possible tympanostomy tube placement. A human validation survey was administered to a cohort of OHNS and pediatrics trainees at our institution. The primary measure of model quality was the Frechet Inception Distance (FID), a metric comparing the distribution of generated images with the distribution of real images. The measures used for human reviewer validation were the sensitivity, specificity, and area under the curve (AUC) for humans’ ability to discern synthetic from real images. Our dataset comprised 202 images. The best GAN was trained at 512x512 image resolution with a FID of 47.0. The progression of images through training showed stepwise “learning” of the anatomic features of a tympanic membrane. The validation survey was taken by 65 persons who reviewed 925 images. Human reviewers demonstrated a sensitivity of 66%, specificity of 73%, and AUC of 0.69 for the detection of synthetic images. In summary, we successfully developed a GAN to produce synthetic tympanic membrane images and validated this with human reviewers. These images could be used to bolster real datasets with various pathologies and develop more robust deep learning models such as those used for diagnostic predictions from otoscopic images. However, caution should be exercised with the use of synthetic data given issues regarding data diversity and performance validation. Any model trained using synthetic data will require robust external validation to ensure validity and generalizability.

  19. f

    Data from: BEGAN: Boltzmann-Reweighted Data Augmentation for Enhanced...

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jialei Dai; Yutong Zhang; Chen Shi; Yang Liu; Peng Xiu; Yong Wang (2024). BEGAN: Boltzmann-Reweighted Data Augmentation for Enhanced GAN-Based Molecule Design in Insect Pheromone Receptors [Dataset]. http://doi.org/10.1021/acs.jpcb.4c06729.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    ACS Publications
    Authors
    Jialei Dai; Yutong Zhang; Chen Shi; Yang Liu; Peng Xiu; Yong Wang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Identifying small molecules that bind strongly to target proteins in rational molecular design is crucial. Machine learning techniques, such as generative adversarial networks (GAN), are now essential tools for generating such molecules. In this study, we present an enhanced method for molecule generation using objective-reinforced GANs. Specifically, we introduce BEGAN (Boltzmann-enhanced GAN), a novel approach that adjusts molecule occurrence frequencies during training based on the Boltzmann distribution exp(−ΔU/τ), where ΔU represents the estimated binding free energy derived from docking algorithms and τ is a temperature-related scaling hyperparameter. This Boltzmann reweighting process shifts the generation process toward molecules with higher binding affinities, allowing the GAN to explore molecular spaces with superior binding properties. The reweighting process can also be refined through multiple iterations without altering the overall distribution shape. To validate our approach, we apply it to the design of sex pheromone analogs targeting Spodoptera frugiperda pheromone receptor SfruOR16, illustrating that the Boltzmann reweighting significantly increases the likelihood of generating promising sex pheromone analogs with improved binding affinities to SfruOR16, further supported by atomistic molecular dynamics simulations. Furthermore, we conduct a comprehensive investigation into parameter dependencies and propose a reasonable range for the hyperparameter τ. Our method offers a promising approach for optimizing molecular generation for enhanced protein binding, potentially increasing the efficiency of molecular discovery pipelines.

  20. f

    Table_1_An Adversarial Deep-Learning-Based Model for Cervical Cancer CTV...

    • frontiersin.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhikai Liu; Wanqi Chen; Hui Guan; Hongnan Zhen; Jing Shen; Xia Liu; An Liu; Richard Li; Jianhao Geng; Jing You; Weihu Wang; Zhouyu Li; Yongfeng Zhang; Yuanyuan Chen; Junjie Du; Qi Chen; Yu Chen; Shaobin Wang; Fuquan Zhang; Jie Qiu (2023). Table_1_An Adversarial Deep-Learning-Based Model for Cervical Cancer CTV Segmentation With Multicenter Blinded Randomized Controlled Validation.docx [Dataset]. http://doi.org/10.3389/fonc.2021.702270.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhikai Liu; Wanqi Chen; Hui Guan; Hongnan Zhen; Jing Shen; Xia Liu; An Liu; Richard Li; Jianhao Geng; Jing You; Weihu Wang; Zhouyu Li; Yongfeng Zhang; Yuanyuan Chen; Junjie Du; Qi Chen; Yu Chen; Shaobin Wang; Fuquan Zhang; Jie Qiu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PurposeTo propose a novel deep-learning-based auto-segmentation model for CTV delineation in cervical cancer and to evaluate whether it can perform comparably well to manual delineation by a three-stage multicenter evaluation framework.MethodsAn adversarial deep-learning-based auto-segmentation model was trained and configured for cervical cancer CTV contouring using CT data from 237 patients. Then CT scans of additional 20 consecutive patients with locally advanced cervical cancer were collected to perform a three-stage multicenter randomized controlled evaluation involving nine oncologists from six medical centers. This evaluation system is a combination of objective performance metrics, radiation oncologist assessment, and finally the head-to-head Turing imitation test. Accuracy and effectiveness were evaluated step by step. The intra-observer consistency of each oncologist was also tested.ResultsIn stage-1 evaluation, the mean DSC and the 95HD value of the proposed model were 0.88 and 3.46 mm, respectively. In stage-2, the oncologist grading evaluation showed the majority of AI contours were comparable to the GT contours. The average CTV scores for AI and GT were 2.68 vs. 2.71 in week 0 (P = .206), and 2.62 vs. 2.63 in week 2 (P = .552), with no significant statistical differences. In stage-3, the Turing imitation test showed that the percentage of AI contours, which were judged to be better than GT contours by ≥5 oncologists, was 60.0% in week 0 and 42.5% in week 2. Most oncologists demonstrated good consistency between the 2 weeks (P > 0.05).ConclusionsThe tested AI model was demonstrated to be accurate and comparable to the manual CTV segmentation in cervical cancer patients when assessed by our three-stage evaluation framework.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yanwen Wang; Yanwen Wang (2024). Python functions -- cross-validation methods from a data-driven perspective [Dataset]. http://doi.org/10.5281/zenodo.12804224
Organization logo

Python functions -- cross-validation methods from a data-driven perspective

Explore at:
bin, zip, txtAvailable download formats
Dataset updated
Jul 24, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yanwen Wang; Yanwen Wang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Jun 28, 2024
Description

This is the organized python functions of proposed methods in Yanwen Wang PhD research. Researchers can directly use these functions to conduct spatial+ cross-validation (SP-CV), adversarial-validation-based dissimilarity quantification method (AVD), and dissimilarity-adaptive cross-validation (DA-CV). The description of how to run codes is in Readme.txt. The descriptions of functions are in functions.docx.

Search
Clear search
Close search
Google apps
Main menu