24 datasets found

Python functions -- cross-validation methods from a data-driven perspective
zenodo.org
bin, txt, zip
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yanwen Wang; Yanwen Wang (2024). Python functions -- cross-validation methods from a data-driven perspective [Dataset]. http://doi.org/10.5281/zenodo.12804224
Explore at:
bin, zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12804224
Dataset updated
Jul 24, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yanwen Wang; Yanwen Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 28, 2024
Description
This is the organized python functions of proposed methods in Yanwen Wang PhD research. Researchers can directly use these functions to conduct spatial+ cross-validation (SP-CV), adversarial-validation-based dissimilarity quantification method (AVD), and dissimilarity-adaptive cross-validation (DA-CV). The description of how to run codes is in Readme.txt. The descriptions of functions are in functions.docx.
P
AdversarialQA Dataset
paperswithcode.com
opendatalab.com
+1more
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Bartolo; Alastair Roberts; Johannes Welbl; Sebastian Riedel; Pontus Stenetorp (2024). AdversarialQA Dataset [Dataset]. https://paperswithcode.com/dataset/adversarialqa
Explore at:
Dataset updated
Jan 9, 2024
Authors
Max Bartolo; Alastair Roberts; Johannes Welbl; Sebastian Riedel; Pontus Stenetorp
Description
We have created three new Reading Comprehension datasets constructed using an adversarial model-in-the-loop.

We use three different models; BiDAF (Seo et al., 2016), BERTLarge (Devlin et al., 2018), and RoBERTaLarge (Liu et al., 2019) in the annotation loop and construct three datasets; D(BiDAF), D(BERT), and D(RoBERTa), each with 10,000 training examples, 1,000 validation, and 1,000 test examples.

The adversarial human annotation paradigm ensures that these datasets consist of questions that current state-of-the-art models (at least the ones used as adversaries in the annotation loop) find challenging. The three AdversarialQA round 1 datasets provide a training and evaluation resource for such methods.
Z
TCAB: Text Classification Attack Benchmark Dataset
data.niaid.nih.gov
zenodo.org
Updated Oct 21, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brophy, Jonathan (2022). TCAB: Text Classification Attack Benchmark Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6615385
Explore at:
Dataset updated
Oct 21, 2022
Dataset provided by
Noack, Adam
You, Wencong
Xie, Zhouhang
Singh, Sameer
Lowd, Daniel
Brophy, Jonathan
Asthana, Kalyani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TCAB is a large collection of successful adversarial attacks on state-of-the-art text classification models trained on multiple sentiment and abuse domain datasets.

The dataset is broken up into 2 files: train.csv and val.csv. The training set contains 1,448,751 instances (552,364 are "clean" unperturbed instances) and the validation set contains 482,914 instances (178,607 are "clean"). Each instance contains the following attributes:

scenario: Domain, either abuse or sentiment.

target_model_dataset: Dataset being attacked.

target_model_train_dataset: Dataset the target model trained on.

target_model: Type of victim model (e.g., bert, roberta, xlnet).

attack_toolchain: Open-source attack toolchain, either TextAttack or OpenAttack.

attack_name: Name of the attack method.

original_text: Original input text.

original_output: Prediction probabilities of the target model on the original text.

ground_truth: Encoded label for the original task of the domain dataset. 1 and 0 means toxic and toxic for abuse datasets, respectively. 1 and 0 means positive and negative sentiment for sentiment datasets. If there is a neutral sentiment, then 2, 1, 0 means positive, neutral, and negative sentiment.

status: Unperturbed example if "clean"; successful adversarial attack if "success".

perturbed_text: Text after it has been perturbed by an attack.

perturbed_output: Prediction probabilities of the target model on the perturbed text.

attack_time: Time taken to execute the attack.

num_queries: Number of queries performed while attacking.

frac_words_changed: Fraction of words changed due to an attack.

test_index: Index of each unique source example (original instance) (LEGACY - necessary for backwards compatibility).

original_text_identifier: Index of each unique source example (original instance).

unique_src_instance_identifier: Primary key to uniquely identify to every source instance; comprised of (target_model_dataset, test_index, original_text_identifier).

pk: Primary key to uniquely identify every attack instance; comprised of (attack_name, attack_toolchain, original_text_identifier, scenario, target_model, target_model_dataset, test_index).
f
Pearson correlation coefficient values of Davies–Bouldin Metric (DBM) and...
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shashank Kotyan; Moe Matsuki; Danilo Vasconcellos Vargas (2023). Pearson correlation coefficient values of Davies–Bouldin Metric (DBM) and Amalgam Metric (AM) with Mean L2 Score of adversarial attacks for each vanilla classifier and attack pair. [Dataset]. http://doi.org/10.1371/journal.pone.0266060.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0266060.t003
Dataset updated
Jun 14, 2023
Dataset provided by
PLOS ONE
Authors
Shashank Kotyan; Moe Matsuki; Danilo Vasconcellos Vargas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pearson correlation coefficient values of Davies–Bouldin Metric (DBM) and Amalgam Metric (AM) with Mean L2 Score of adversarial attacks for each vanilla classifier and attack pair.
f
Average training times of DSAN-MMD on CSI image dataset across all training...
figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman (2024). Average training times of DSAN-MMD on CSI image dataset across all training percentages. [Dataset]. http://doi.org/10.1371/journal.pone.0298888.t011
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298888.t011
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Average training times of DSAN-MMD on CSI image dataset across all training percentages.
m
Batik Nitik 960
data.mendeley.com
Updated Jan 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agus Eko Minarno (2023). Batik Nitik 960 [Dataset]. http://doi.org/10.17632/sgh484jxzy.3
Explore at:
Unique identifier
https://doi.org/10.17632/sgh484jxzy.3
Dataset updated
Jan 23, 2023
Authors
Agus Eko Minarno
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Batik is one of the Indonesian cultural heritages which have noble cultural and philosophical meaning in every motif. This article introduces a dataset from Yogyakarta, Indonesia, "Batik Nitik 960". Dataset was captured from a piece of fabric consisting of 60 Nitik motifs. The dataset provided by Paguyuban Pecinta Batik Indonesia (PPBI) Sekar Jagad Yogyakarta, collection of Winotosasto Batik, and the data taken in APIPS Gallery. The dataset is divided into 60 categories with a total of 960 images, and each class has 16 photos. The images were captured by Sony Alpha a6400, lighting using Godox SK II 400, and data was filtered using jpg format. Each category has four motifs and is augmented using rotation using 90, 180, and 270 degrees. Each class has a philosophical meaning which describes the history of the motif. This dataset allows the training and validation of machine learning models to classify, retrieve, or generate a new batik motif using a generative adversarial network. To our knowledge, this is the first publicly available Batik Nitik dataset with philosophical meaning. Data provides by a collaboration of PPBI Sekar Jagad Yogyakarta, Universitas Muhammadiyah Malang, and Universitas Gadjah Mada. Hope this dataset "Batik Nitik 960" can support batik research and we are open to research collaboration.
f
Statistical results of the accuracy rate.
plos.figshare.com
xlsx
Updated Jan 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Statistical results of the accuracy rate. [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s010
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292140.s010
Dataset updated
Jan 26, 2024
Dataset provided by
PLOS ONE
Authors
Yansong Liu; Shuang Wang; He Sui; Li Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.
h
anli_r3_en
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anli_r3_en [Dataset]. https://huggingface.co/datasets/vietgpt/anli_r3_en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
VietGPT
Description
The Adversarial Natural Language Inference (ANLI)

Source: https://huggingface.co/datasets/anli Num examples: 100,459 (train) 1,200 (validation) 1,200 (test)

Language: English

from datasets import load_dataset

load_dataset("vietgpt/anli_r3_en")

Format for NLI task

def preprocess(sample): premise = sample['premise'] hypothesis = sample['hypothesis'] label = sample['label']

if label == 0: label = "entailment" elif label == 1: label =… See the full description on the dataset page: https://huggingface.co/datasets/vietgpt/anli_r3_en.
f
Effect of confidence level on detection accuracy rate (HDFS).
plos.figshare.com
bin
Updated Jan 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Effect of confidence level on detection accuracy rate (HDFS). [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s005
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292140.s005
Dataset updated
Jan 26, 2024
Dataset provided by
PLOS ONE
Authors
Yansong Liu; Shuang Wang; He Sui; Li Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Effect of confidence level on detection accuracy rate (HDFS).
f
Average micro-F1 scores of DASAN-LMMD on CSI image dataset across all...
plos.figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman (2024). Average micro-F1 scores of DASAN-LMMD on CSI image dataset across all training percentages. [Dataset]. http://doi.org/10.1371/journal.pone.0298888.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298888.t004
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Average micro-F1 scores of DASAN-LMMD on CSI image dataset across all training percentages.
f
Average macro-F1 scores of DASAN-MMD on CSI image dataset across all...
plos.figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman (2024). Average macro-F1 scores of DASAN-MMD on CSI image dataset across all training percentages. [Dataset]. http://doi.org/10.1371/journal.pone.0298888.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298888.t009
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Muhammad Hassan; Tom Kelsey; Fahrurrozi Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Average macro-F1 scores of DASAN-MMD on CSI image dataset across all training percentages.
f
Table_4_MADGAN:A microbe-disease association prediction model based on...
figshare.com
xlsx
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weixin Hu; Xiaoyu Yang; Lei Wang; Xianyou Zhu (2023). Table_4_MADGAN:A microbe-disease association prediction model based on generative adversarial networks.XLSX [Dataset]. http://doi.org/10.3389/fmicb.2023.1159076.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmicb.2023.1159076.s004
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Weixin Hu; Xiaoyu Yang; Lei Wang; Xianyou Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Researches have demonstrated that microorganisms are indispensable for the nutrition transportation, growth and development of human bodies, and disorder and imbalance of microbiota may lead to the occurrence of diseases. Therefore, it is crucial to study relationships between microbes and diseases. In this manuscript, we proposed a novel prediction model named MADGAN to infer potential microbe-disease associations by combining biological information of microbes and diseases with the generative adversarial networks. To our knowledge, it is the first attempt to use the generative adversarial network to complete this important task. In MADGAN, we firstly constructed different features for microbes and diseases based on multiple similarity metrics. And then, we further adopted graph convolution neural network (GCN) to derive different features for microbes and diseases automatically. Finally, we trained MADGAN to identify latent microbe-disease associations by games between the generation network and the decision network. Especially, in order to prevent over-smoothing during the model training process, we introduced the cross-level weight distribution structure to enhance the depth of the network based on the idea of residual network. Moreover, in order to validate the performance of MADGAN, we conducted comprehensive experiments and case studies based on databases of HMDAD and Disbiome respectively, and experimental results demonstrated that MADGAN not only achieved satisfactory prediction performances, but also outperformed existing state-of-the-art prediction models.
f
Accuracy rate under different noise rates on RBF and Waveform datasets.
plos.figshare.com
bin
Updated Jan 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Accuracy rate under different noise rates on RBF and Waveform datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s009
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292140.s009
Dataset updated
Jan 26, 2024
Dataset provided by
PLOS ONE
Authors
Yansong Liu; Shuang Wang; He Sui; Li Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accuracy rate under different noise rates on RBF and Waveform datasets.
Accuracy rate with the number of instances processed.
plos.figshare.com
bin
Updated Jan 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Accuracy rate with the number of instances processed. [Dataset]. http://doi.org/10.1371/journal.pone.0292140.s001
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292140.s001
Dataset updated
Jan 26, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Yansong Liu; Shuang Wang; He Sui; Li Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accuracy rate with the number of instances processed.
f
Parameters used in training CNN models.
plos.figshare.com
xls
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hazem Zein; Samer Chantaf; Régis Fournier; Amine Nait-Ali (2024). Parameters used in training CNN models. [Dataset]. http://doi.org/10.1371/journal.pone.0297958.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0297958.t003
Dataset updated
Apr 16, 2024
Dataset provided by
PLOS ONE
Authors
Hazem Zein; Samer Chantaf; Régis Fournier; Amine Nait-Ali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Model Checkpoint was used to save the model on each iteration if validation accuracy improves.
f
Comparison of running time consumption of each algorithm (seconds).
plos.figshare.com
figshare.com
xls
Updated Jan 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yansong Liu; Shuang Wang; He Sui; Li Zhu (2024). Comparison of running time consumption of each algorithm (seconds). [Dataset]. http://doi.org/10.1371/journal.pone.0292140.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292140.t004
Dataset updated
Jan 26, 2024
Dataset provided by
PLOS ONE
Authors
Yansong Liu; Shuang Wang; He Sui; Li Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of running time consumption of each algorithm (seconds).
f
Data_Sheet_1_Site effects how-to and when: An overview of retrospective...
frontiersin.figshare.com
pdf
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johanna M. M. Bayer; Paul M. Thompson; Christopher R. K. Ching; Mengting Liu; Andrew Chen; Alana C. Panzenhagen; Neda Jahanshad; Andre Marquand; Lianne Schmaal; Philipp G. Sämann (2023). Data_Sheet_1_Site effects how-to and when: An overview of retrospective techniques to accommodate site effects in multi-site neuroimaging analyses.PDF [Dataset]. http://doi.org/10.3389/fneur.2022.923988.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fneur.2022.923988.s001
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
Johanna M. M. Bayer; Paul M. Thompson; Christopher R. K. Ching; Mengting Liu; Andrew Chen; Alana C. Panzenhagen; Neda Jahanshad; Andre Marquand; Lianne Schmaal; Philipp G. Sämann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Site differences, or systematic differences in feature distributions across multiple data-acquisition sites, are a known source of heterogeneity that may adversely affect large-scale meta- and mega-analyses of independently collected neuroimaging data. They influence nearly all multi-site imaging modalities and biomarkers, and methods to compensate for them can improve reliability and generalizability in the analysis of genetics, omics, and clinical data. The origins of statistical site effects are complex and involve both technical differences (scanner vendor, head coil, acquisition parameters, imaging processing) and differences in sample characteristics (inclusion/exclusion criteria, sample size, ancestry) between sites. In an age of expanding international consortium research, there is a growing need to disentangle technical site effects from sample characteristics of interest. Numerous statistical and machine learning methods have been developed to control for, model, or attenuate site effects – yet to date, no comprehensive review has discussed the benefits and drawbacks of each for different use cases. Here, we provide an overview of the different existing statistical and machine learning methods developed to remove unwanted site effects from independently collected neuroimaging samples. We focus on linear mixed effect models, the ComBat technique and its variants, adjustments based on image quality metrics, normative modeling, and deep learning approaches such as generative adversarial networks. For each method, we outline the statistical foundation and summarize strengths and weaknesses, including their assumptions and conditions of use. We provide information on software availability and comment on the ease of use and the applicability of these methods to different types of data. We discuss validation and comparative reports, mention caveats and provide guidance on when to use each method, depending on context and specific research questions.
f
Contingency table of human image classification.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krish Suresh; Michael S. Cohen; Christopher J. Hartnick; Ryan A. Bartholomew; Daniel J. Lee; Matthew G. Crowson (2023). Contingency table of human image classification. [Dataset]. http://doi.org/10.1371/journal.pdig.0000202.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000202.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS Digital Health
Authors
Krish Suresh; Michael S. Cohen; Christopher J. Hartnick; Ryan A. Bartholomew; Daniel J. Lee; Matthew G. Crowson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic clinical images could augment real medical image datasets, a novel approach in otolaryngology–head and neck surgery (OHNS). Our objective was to develop a generative adversarial network (GAN) for tympanic membrane images and to validate the quality of synthetic images with human reviewers. Our model was developed using a state-of-the-art GAN architecture, StyleGAN2-ADA. The network was trained on intraoperative high-definition (HD) endoscopic images of tympanic membranes collected from pediatric patients undergoing myringotomy with possible tympanostomy tube placement. A human validation survey was administered to a cohort of OHNS and pediatrics trainees at our institution. The primary measure of model quality was the Frechet Inception Distance (FID), a metric comparing the distribution of generated images with the distribution of real images. The measures used for human reviewer validation were the sensitivity, specificity, and area under the curve (AUC) for humans’ ability to discern synthetic from real images. Our dataset comprised 202 images. The best GAN was trained at 512x512 image resolution with a FID of 47.0. The progression of images through training showed stepwise “learning” of the anatomic features of a tympanic membrane. The validation survey was taken by 65 persons who reviewed 925 images. Human reviewers demonstrated a sensitivity of 66%, specificity of 73%, and AUC of 0.69 for the detection of synthetic images. In summary, we successfully developed a GAN to produce synthetic tympanic membrane images and validated this with human reviewers. These images could be used to bolster real datasets with various pathologies and develop more robust deep learning models such as those used for diagnostic predictions from otoscopic images. However, caution should be exercised with the use of synthetic data given issues regarding data diversity and performance validation. Any model trained using synthetic data will require robust external validation to ensure validity and generalizability.
f
Data from: BEGAN: Boltzmann-Reweighted Data Augmentation for Enhanced...
figshare.com
acs.figshare.com
xlsx
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jialei Dai; Yutong Zhang; Chen Shi; Yang Liu; Peng Xiu; Yong Wang (2024). BEGAN: Boltzmann-Reweighted Data Augmentation for Enhanced GAN-Based Molecule Design in Insect Pheromone Receptors [Dataset]. http://doi.org/10.1021/acs.jpcb.4c06729.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jpcb.4c06729.s003
Dataset updated
Nov 21, 2024
Dataset provided by
ACS Publications
Authors
Jialei Dai; Yutong Zhang; Chen Shi; Yang Liu; Peng Xiu; Yong Wang
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Identifying small molecules that bind strongly to target proteins in rational molecular design is crucial. Machine learning techniques, such as generative adversarial networks (GAN), are now essential tools for generating such molecules. In this study, we present an enhanced method for molecule generation using objective-reinforced GANs. Specifically, we introduce BEGAN (Boltzmann-enhanced GAN), a novel approach that adjusts molecule occurrence frequencies during training based on the Boltzmann distribution exp(−ΔU/τ), where ΔU represents the estimated binding free energy derived from docking algorithms and τ is a temperature-related scaling hyperparameter. This Boltzmann reweighting process shifts the generation process toward molecules with higher binding affinities, allowing the GAN to explore molecular spaces with superior binding properties. The reweighting process can also be refined through multiple iterations without altering the overall distribution shape. To validate our approach, we apply it to the design of sex pheromone analogs targeting Spodoptera frugiperda pheromone receptor SfruOR16, illustrating that the Boltzmann reweighting significantly increases the likelihood of generating promising sex pheromone analogs with improved binding affinities to SfruOR16, further supported by atomistic molecular dynamics simulations. Furthermore, we conduct a comprehensive investigation into parameter dependencies and propose a reasonable range for the hyperparameter τ. Our method offers a promising approach for optimizing molecular generation for enhanced protein binding, potentially increasing the efficiency of molecular discovery pipelines.
f
Table_1_An Adversarial Deep-Learning-Based Model for Cervical Cancer CTV...
frontiersin.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhikai Liu; Wanqi Chen; Hui Guan; Hongnan Zhen; Jing Shen; Xia Liu; An Liu; Richard Li; Jianhao Geng; Jing You; Weihu Wang; Zhouyu Li; Yongfeng Zhang; Yuanyuan Chen; Junjie Du; Qi Chen; Yu Chen; Shaobin Wang; Fuquan Zhang; Jie Qiu (2023). Table_1_An Adversarial Deep-Learning-Based Model for Cervical Cancer CTV Segmentation With Multicenter Blinded Randomized Controlled Validation.docx [Dataset]. http://doi.org/10.3389/fonc.2021.702270.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fonc.2021.702270.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Zhikai Liu; Wanqi Chen; Hui Guan; Hongnan Zhen; Jing Shen; Xia Liu; An Liu; Richard Li; Jianhao Geng; Jing You; Weihu Wang; Zhouyu Li; Yongfeng Zhang; Yuanyuan Chen; Junjie Du; Qi Chen; Yu Chen; Shaobin Wang; Fuquan Zhang; Jie Qiu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PurposeTo propose a novel deep-learning-based auto-segmentation model for CTV delineation in cervical cancer and to evaluate whether it can perform comparably well to manual delineation by a three-stage multicenter evaluation framework.MethodsAn adversarial deep-learning-based auto-segmentation model was trained and configured for cervical cancer CTV contouring using CT data from 237 patients. Then CT scans of additional 20 consecutive patients with locally advanced cervical cancer were collected to perform a three-stage multicenter randomized controlled evaluation involving nine oncologists from six medical centers. This evaluation system is a combination of objective performance metrics, radiation oncologist assessment, and finally the head-to-head Turing imitation test. Accuracy and effectiveness were evaluated step by step. The intra-observer consistency of each oncologist was also tested.ResultsIn stage-1 evaluation, the mean DSC and the 95HD value of the proposed model were 0.88 and 3.46 mm, respectively. In stage-2, the oncologist grading evaluation showed the majority of AI contours were comparable to the GT contours. The average CTV scores for AI and GT were 2.68 vs. 2.71 in week 0 (P = .206), and 2.62 vs. 2.63 in week 2 (P = .552), with no significant statistical differences. In stage-3, the Turing imitation test showed that the percentage of AI contours, which were judged to be better than GT contours by ≥5 oncologists, was 60.0% in week 0 and 42.5% in week 2. Most oncologists demonstrated good consistency between the 2 weeks (P > 0.05).ConclusionsThe tested AI model was demonstrated to be accurate and comparable to the manual CTV segmentation in cervical cancer patients when assessed by our three-stage evaluation framework.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yanwen Wang; Yanwen Wang (2024). Python functions -- cross-validation methods from a data-driven perspective [Dataset]. http://doi.org/10.5281/zenodo.12804224

Python functions -- cross-validation methods from a data-driven perspective

Explore at:

bin, zip, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.12804224

Dataset updated

Jul 24, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Yanwen Wang; Yanwen Wang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jun 28, 2024

Description

This is the organized python functions of proposed methods in Yanwen Wang PhD research. Researchers can directly use these functions to conduct spatial+ cross-validation (SP-CV), adversarial-validation-based dissimilarity quantification method (AVD), and dissimilarity-adaptive cross-validation (DA-CV). The description of how to run codes is in Readme.txt. The descriptions of functions are in functions.docx.

Clear search

Close search

Google apps

Main menu

Python functions -- cross-validation methods from a data-driven perspective

AdversarialQA Dataset

TCAB: Text Classification Attack Benchmark Dataset

Pearson correlation coefficient values of Davies–Bouldin Metric (DBM) and...

Average training times of DSAN-MMD on CSI image dataset across all training...

Batik Nitik 960

Statistical results of the accuracy rate.

anli_r3_en

Effect of confidence level on detection accuracy rate (HDFS).

Average micro-F1 scores of DASAN-LMMD on CSI image dataset across all...

Average macro-F1 scores of DASAN-MMD on CSI image dataset across all...

Table_4_MADGAN:A microbe-disease association prediction model based on...

Accuracy rate under different noise rates on RBF and Waveform datasets.

Accuracy rate with the number of instances processed.

Parameters used in training CNN models.

Comparison of running time consumption of each algorithm (seconds).

Data_Sheet_1_Site effects how-to and when: An overview of retrospective...

Contingency table of human image classification.

Data from: BEGAN: Boltzmann-Reweighted Data Augmentation for Enhanced...

Table_1_An Adversarial Deep-Learning-Based Model for Cervical Cancer CTV...

Python functions -- cross-validation methods from a data-driven perspectiveSee More Versions

Python functions -- cross-validation methods from a data-driven perspective