6 datasets found
  1. Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...

    • zenodo.org
    • data.niaid.nih.gov
    xz
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. http://doi.org/10.5281/zenodo.7065064
    Explore at:
    xzAvailable download formats
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

    Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.

  2. Synthetic Skin Cancer Dataset/Only Synthetic

    • kaggle.com
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DevDope (2024). Synthetic Skin Cancer Dataset/Only Synthetic [Dataset]. https://www.kaggle.com/datasets/devdope/synthetic-skin-disease-datasetonly-synthetic/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DevDope
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset Title

    Skin Disease GAN-Generated Lightweight Dataset

    General Description

    This dataset is a collection of skin disease images generated using a Generative Adversarial Network (GAN) approach. Specifically, a GAN was utilized with Stable Diffusion as the generator and a transformer-based discriminator to create realistic images of various skin diseases. The GAN approach enhances the accuracy and realism of the generated images, making this dataset a valuable resource for machine learning and computer vision applications in dermatology.

    Creation Process

    To create this dataset, a series of Low-Rank Adaptations (LoRAs) were generated for each disease category. These LoRAs were trained on the base dataset with 60 epochs and 30,000 steps using OneTrainer. Images were then generated for the following disease categories:

    • Herpes
    • Measles
    • Chickenpox
    • Monkeypox

    Due to the availability of ample public images, Melanoma was excluded from the generation process. The Fooocus API served as the generator within the GAN framework, creating images based on the LoRAs.

    To ensure quality and accuracy, a transformer-based discriminator was employed to verify the generated images, classifying them into the correct disease categories.

    Sources

    The original base dataset used to create this GAN-based dataset includes reputable sources such as:

    2019 HAM10000 Challenge - Kaggle - Google Images - Dermnet NZ - Bing Images - Yandex - Hellenic Atlas - Dermatological Atlas The LoRAs and their recommended weights for generating images are available for download on our CivitAi profile. You can refer to this profile for detailed instructions and access to the LoRAs used in this dataset.

    Dataset Contents

    Generated Images: High-quality images of skin diseases generated via GAN with Stable Diffusion, using transformer-based discrimination for accurate classification.

    Categories

    • Herpes
    • Measles
    • Chickenpox
    • Monkeypox Each image corresponds to one of these four categories, providing a reliable set of generated data for training and evaluation. Melanoma was excluded from generation due to the abundance of public data.

    Suggested Use Cases

    This dataset is suitable for:

    • Image Classification and Augmentation Tasks: Training and evaluating models in skin disease classification, with additional augmentation from generated images.
    • Research in Dermatology and GAN Techniques: Investigating the effectiveness of GANs for generating medical images, as well as exploring the use of transformer-based discrimination.
    • Educational Projects in AI and Medicine: Offering insights into image generation for diagnostic purposes, combining GANs and Stable Diffusion with transformers for medical datasets.

    Citation

    When using this dataset, please cite the following reference: Espinosa, E.G., Castilla, J.S.R., Lamont, F.G. (2025). Skin Disease Pre-diagnosis with Novel Visual Transformers. In: Figueroa-García, J.C., Hernández, G., Suero Pérez, D.F., Gaona García, E.E. (eds) Applied Computer Sciences in Engineering. WEA 2024. Communications in Computer and Information Science, vol 2222. Springer, Cham. https://doi.org/10.1007/978-3-031-74595-9_10

  3. Z

    GAN-based Synthetic VIIRS-like Image Generation over India

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. K. Srivastav (2023). GAN-based Synthetic VIIRS-like Image Generation over India [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7854533
    Explore at:
    Dataset updated
    May 25, 2023
    Dataset provided by
    S. K. Srivastav
    Mehak Jindal
    Prasun Kumar Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Monthly nighttime lights (NTL) can clearly depict an area's prevailing intra-year socio-economic dynamics. The Earth Observation Group at Colorado School of Mines provides monthly NTL products from the Day Night Band (DNB) sensor on board the Visible and Infrared Imaging Suite (VIIRS) satellite (April 2012 onwards) and from Operational Linescan System (OLS) sensor onboard the Defense Meteorological Satellite Program (DMSP) satellites (April 1992 onwards). In the current study, an attempt has been made to generate synthetic monthly VIIRS-like products of 1992-2012, using a deep learning-based image translation network. Initially, the defects of the 216 monthly DMSP images (1992-2013) were corrected to remove geometric errors, background noise, and radiometric errors. Correction on monthly VIIRS imagery to remove background noise and ephemeral lights was done using low and high thresholds. Improved DMSP and corrected VIIRS images from April 2012 - December 2013 are used in a conditional generative adversarial network (cGAN) along with Land Use Land Cover, as auxiliary input, to generate VIIRS-like imagery from 1992-2012. The modelled imagery was aggregated annually and showed an R2 of 0.94 with the results of other annual-scale VIIRS-like imagery products of India, R2 of 0.85 w.r.t GDP and R2 of 0.69 w.r.t population. Regression analysis of the generated VIIRS-like products with the actual VIIRS images for the years 2012 and 2013 over India indicated a good approximation with an R2 of 0.64 and 0.67 respectively, while the spatial density relation depicted an under-estimation of the brightness values by the model at extremely high radiance values with an R2 of 0.56 and 0.53 respectively. Qualitative analysis for also performed on both national and state scales. Visual analysis over 1992-2013 confirms a gradual increase in the brightness of the lights indicating that the cGAN model images closely represent the actual pattern followed by the nighttime lights. Finally, a synthetically generated monthly VIIRS-like product is delivered to the research community which will be useful for studying the changes in socio-economic dynamics over time.

  4. f

    Table_1_Deep learning strategies for addressing issues with small datasets...

    • figshare.com
    xlsx
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cody Allen; Shiva Aryal; Tuyen Do; Rishav Gautum; Md Mahmudul Hasan; Bharat K. Jasthi; Etienne Gnimpieba; Venkataramana Gadhamshetty (2023). Table_1_Deep learning strategies for addressing issues with small datasets in 2D materials research: Microbial Corrosion.XLSX [Dataset]. http://doi.org/10.3389/fmicb.2022.1059123.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Cody Allen; Shiva Aryal; Tuyen Do; Rishav Gautum; Md Mahmudul Hasan; Bharat K. Jasthi; Etienne Gnimpieba; Venkataramana Gadhamshetty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protective coatings based on two dimensional materials such as graphene have gained traction for diverse applications. Their impermeability, inertness, excellent bonding with metals, and amenability to functionalization renders them as promising coatings for both abiotic and microbiologically influenced corrosion (MIC). Owing to the success of graphene coatings, the whole family of 2D materials, including hexagonal boron nitride and molybdenum disulphide are being screened to obtain other promising coatings. AI-based data-driven models can accelerate virtual screening of 2D coatings with desirable physical and chemical properties. However, lack of large experimental datasets renders training of classifiers difficult and often results in over-fitting. Generate large datasets for MIC resistance of 2D coatings is both complex and laborious. Deep learning data augmentation methods can alleviate this issue by generating synthetic electrochemical data that resembles the training data classes. Here, we investigated two different deep generative models, namely variation autoencoder (VAE) and generative adversarial network (GAN) for generating synthetic data for expanding small experimental datasets. Our model experimental system included few layered graphene over copper surfaces. The synthetic data generated using GAN displayed a greater neural network system performance (83-85% accuracy) than VAE generated synthetic data (78-80% accuracy). However, VAE data performed better (90% accuracy) than GAN data (84%-85% accuracy) when using XGBoost. Finally, we show that synthetic data based on VAE and GAN models can drive machine learning models for developing MIC resistant 2D coatings.

  5. f

    Real data, balanced.

    • plos.figshare.com
    xls
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edmond Adib; Fatemeh Afghah; John J Prevost (2025). Real data, balanced. [Dataset]. http://doi.org/10.1371/journal.pone.0313772.t013
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Edmond Adib; Fatemeh Afghah; John J Prevost
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electrocardiogram (ECG) datasets tend to be highly imbalanced due to the scarcity of abnormal cases. Additionally, the use of real patients’ ECGs is highly regulated due to privacy issues. Therefore, there is always a need for more ECG data, especially for the training of automatic diagnosis machine learning models, which perform better when trained on a balanced dataset. We studied the synthetic ECG generation capability of 5 different models from the generative adversarial network (GAN) family and compared their performances, the focus being only on Normal cardiac cycles. Dynamic Time Warping (DTW), Fréchet, and Euclidean distance functions were employed to quantitatively measure performance. Five different methods for evaluating generated beats were proposed and applied. We also proposed 3 new concepts (threshold, accepted beat and productivity rate) and employed them along with the aforementioned methods as a systematic way for comparison between models. The results show that all the tested models can, to an extent, successfully mass-generate acceptable heartbeats with high similarity in morphological features, and potentially all of them can be used to augment imbalanced datasets. However, visual inspections of generated beats favors BiLSTM-DC GAN and WGAN, as they produce statistically more acceptable beats. Also, with regards to productivity rate, the Classic GAN is superior with a 72% productivity rate. We also designed a simple experiment with the state-of-the-art classifier (ECGResNet34) to show empirically that the augmentation of the imbalanced dataset by synthetic ECG signals could improve the performance of classification significantly.

  6. Myctobase, a circumpolar database of mesopelagic fishes for new insights...

    • gbif.org
    • obis.org
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Briannyn Woods; Rowan Trebilco; Andrea Walters; Mark Hindell; Guy Duhamel; Hauke Flores; Masato Moteki; Patrice Pruvost; Christian Reiss; Ryan Saunders; Caroline Sutton; Yi-Ming Gan; Anton Van de Putte; Briannyn Woods; Rowan Trebilco; Andrea Walters; Mark Hindell; Guy Duhamel; Hauke Flores; Masato Moteki; Patrice Pruvost; Christian Reiss; Ryan Saunders; Caroline Sutton; Yi-Ming Gan; Anton Van de Putte (2023). Myctobase, a circumpolar database of mesopelagic fishes for new insights into deep pelagic prey fields - data [Dataset]. http://doi.org/10.15468/u25dy5
    Explore at:
    Dataset updated
    May 4, 2023
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    SCAR - AntOBIS
    Authors
    Briannyn Woods; Rowan Trebilco; Andrea Walters; Mark Hindell; Guy Duhamel; Hauke Flores; Masato Moteki; Patrice Pruvost; Christian Reiss; Ryan Saunders; Caroline Sutton; Yi-Ming Gan; Anton Van de Putte; Briannyn Woods; Rowan Trebilco; Andrea Walters; Mark Hindell; Guy Duhamel; Hauke Flores; Masato Moteki; Patrice Pruvost; Christian Reiss; Ryan Saunders; Caroline Sutton; Yi-Ming Gan; Anton Van de Putte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 13, 1991 - Jan 18, 2019
    Area covered
    Description

    The global importance of mesopelagic fish is increasingly recognised, but they remain relatively poorly studied. This is particularly true in the Southern Ocean, where mesopelagic fishes are both key predators and prey, but where the remote environment makes sampling challenging. Despite this, multiple national Antarctic research programs have undertaken regional sampling of mesopelagic fish over several decades. However, data are dispersed, and sampling methodologies often differ precluding comparisons and limiting synthetic analyses. We identified potential data holders by compiling a metadata catalogue of existing survey data for Southern Ocean mesopelagic fishes. Data holders contributed 17,491 occurrence and 11,190 abundance records from 4780 net hauls from 72 different research cruises. Data span across 37 years from 1991 to 2019 and include trait-based information (length, weight, maturity). The final dataset underwent quality control processes and detailed metadata was provided for each sampling event. This dataset can be accessed through the Antarctic Biodiversity Portal. Myctobase will enhance research capacity by providing the broadscale baseline data necessary for observing and modelling mesopelagic fishes. The paper which this dataset refers to is available at https://doi.org/10.1038/s41597-022-01496-y. The original dataset as submitted for the paper is also available on Zenodo at https://doi.org/10.5281/zenodo.6131579.

    This dataset is published by SCAR-AntOBIS under the license CC-BY 4.0. We would appreciate it if you could follow the guidelines from the SCAR and IPY Data Policies (https://www.scar.org/excom-meetings/xxxi-scar-delegates-2010-buenos-aires-argentina/4563-scar-xxxi-ip04b-scar-data-policy/file/) when using the data. If you have any questions regarding this dataset, please do not hesitate to contact us via the contact information provided in the metadata or via data-biodiversity-aq@naturalsciences.be. Issues with dataset can be reported at https://github.com/biodiversity-aq/data-publication/ This dataset is part of the Belgian contribution to the EU-Lifewatch project funded by the Belgian Science Policy Office (BELSPO, contract n°FR/36/AN1/AntaBIS) .

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. http://doi.org/10.5281/zenodo.7065064
Organization logo

Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks

Related Article
Explore at:
xzAvailable download formats
Dataset updated
Oct 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.

Search
Clear search
Close search
Google apps
Main menu