44 datasets found

h
paperswithcode
huggingface.co
Updated Jul 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Wilinski (2023). paperswithcode [Dataset]. https://huggingface.co/datasets/J0nasW/paperswithcode
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2023
Authors
Jonas Wilinski
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A cleaned dataset from paperswithcode.com

Last dataset update: July 2023 This is a cleaned up dataset optained from paperswithcode.com through their API service. It represents a set of around 56K carefully categorized papers into 3K tasks and 16 areas. The papers contain arXiv and NIPS IDs as well as title, abstract and other meta information. It can be used for training text classifiers that concentrate on the use of specific AI and ML methods and frameworks.

Contents… See the full description on the dataset page: https://huggingface.co/datasets/J0nasW/paperswithcode.
Machine learning techniques with code
zenodo.org
json
Updated Jul 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paperswithcode; Paperswithcode (2022). Machine learning techniques with code [Dataset]. http://doi.org/10.5281/zenodo.6788250
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6788250
Dataset updated
Jul 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paperswithcode; Paperswithcode
License
Attribution-ShareAlike 1.0 (CC BY-SA 1.0)https://creativecommons.org/licenses/by-sa/1.0/
License information was derived automatically
Description
This dataset contains data from Paperswithcode.com obtained on January of 2022. The first file, 'Papers_with_abstracts' includes information about different research papers like their title, abstract, authors, etc. 'Links_between_papers_and_code' includes each paper connection to their corresponding github repository. Finally, 'Methods' includes a categorization of the aforementioned papers done by the community of Paperswithcode in different areas. These files were used in the Master Thesis Topic Modeling for Research Software done by María Ayuso in Universidad Politécnica de Madrid.
CIFAKE: Real and AI-Generated Synthetic Images
kaggle.com
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan J. Bird (2023). CIFAKE: Real and AI-Generated Synthetic Images [Dataset]. https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jordan J. Bird
Description
CIFAKE: Real and AI-Generated Synthetic Images

The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.

CIFAKE is a dataset that contains 60,000 synthetically-generated images and 60,000 real images (collected from CIFAR-10). Can computer vision techniques be used to detect when an image is real or has been generated by AI?

Further information on this dataset can be found here: Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

Dataset details

The dataset contains two classes - REAL and FAKE.

For REAL, we collected the images from Krizhevsky & Hinton's CIFAR-10 dataset

For the FAKE images, we generated the equivalent of CIFAR-10 with Stable Diffusion version 1.4

There are 100,000 images for training (50k per class) and 20,000 for testing (10k per class)

Papers with Code

The dataset and all studies using it are linked using Papers with Code https://paperswithcode.com/dataset/cifake-real-and-ai-generated-synthetic-images

References

If you use this dataset, you must cite the following sources

Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

Real images are from Krizhevsky & Hinton (2009), fake images are from Bird & Lotfi (2024). The Bird & Lotfi study is available here.

Notes

The updates to the dataset on the 28th of March 2023 did not change anything; the file formats ".jpeg" were renamed ".jpg" and the root folder was uploaded to meet Kaggle's usability requirements.

License

This dataset is published under the same MIT license as CIFAR-10:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
P
WikiText-103 Dataset
paperswithcode.com
opendatalab.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher, WikiText-103 Dataset [Dataset]. https://paperswithcode.com/dataset/wikitext-103
Explore at:
Authors
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher
Description
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers - all of which are removed in PTB. As it is composed of full articles, the dataset is well suited for models that can take advantage of long term dependencies.
h
deepnets1m
huggingface.co
opendatalab.com
+1more
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samsung SAIT AI Lab, Montreal (2024). deepnets1m [Dataset]. https://huggingface.co/datasets/SamsungSAILMontreal/deepnets1m
Explore at:
Dataset updated
Apr 6, 2024
Dataset authored and provided by
Samsung SAIT AI Lab, Montreal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a copy of the DeepNets-1M dataset originally released at https://github.com/facebookresearch/ppuda under the MIT license. The dataset presents diverse computational graphs (1M training and 1402 evaluation) of neural network architectures used in image classification. See detailed description at https://paperswithcode.com/dataset/deepnets-1m and in the Parameter Prediction for Unseen Deep Architectures paper. There are four files in this dataset:

deepnets1m_eval.hdf5; # 16 MB (md5:… See the full description on the dataset page: https://huggingface.co/datasets/SamsungSAILMontreal/deepnets1m.
UniToBrain Dataset
zenodo.org
ieee-dataport.org
+2more
bin, csv, pdf
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umberto Gava; Umberto Gava; Federico D'Agata; Federico D'Agata; Edwin Bennink; Edwin Bennink; Enzo Tartaglione; Enzo Tartaglione; Annamaria Vernone; Annamaria Vernone; Francesca Bertolino; Eleonora Ficiarà; Eleonora Ficiarà; Alessandro Cicerale; Alessandro Cicerale; Fabrizio Pizzagalli; Fabrizio Pizzagalli; Caterina Guiot; Caterina Guiot; Marco Grangetto; Marco Grangetto; Mauro Bergui; Mauro Bergui; Francesca Bertolino (2024). UniToBrain Dataset [Dataset]. http://doi.org/10.5281/zenodo.4817605
Explore at:
pdf, csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4817605
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Umberto Gava; Umberto Gava; Federico D'Agata; Federico D'Agata; Edwin Bennink; Edwin Bennink; Enzo Tartaglione; Enzo Tartaglione; Annamaria Vernone; Annamaria Vernone; Francesca Bertolino; Eleonora Ficiarà; Eleonora Ficiarà; Alessandro Cicerale; Alessandro Cicerale; Fabrizio Pizzagalli; Fabrizio Pizzagalli; Caterina Guiot; Caterina Guiot; Marco Grangetto; Marco Grangetto; Mauro Bergui; Mauro Bergui; Francesca Bertolino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The University of Turin (UniTO) released the open-access dataset UniTOBrain collected for the homonymous Use Case 3 in the DeepHealth project (https://deephealth-project.eu/). UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP). The dataset includes 100 training subjects and 15 testing subjects used in a submitted publication for the training and the testing of a Convolutional Neural Network (CNN, see for details: https://arxiv.org/abs/2101.05992, https://paperswithcode.com/paper/neural-network-derived-perfusion-maps-a-model, https://www.medrxiv.org/content/10.1101/2021.01.13.21249757v1). At this stage, the UniTO team released this dataset privately, but soon it will be public. This is a subsample of a greater dataset of 258 subjects that will be soon available for download at https://ieee-dataport.org/.
CTP data from 258 consecutive patients were retrospectively obtained from the hospital PACS of Città della Salute e della Scienza di Torino (Molinette). CTP acquisition parameters were as follows: Scanner GE, 64 slices, 80 kV, 150 mAs, 44.5 sec duration, 89 volumes (40 mm axial coverage), injection of 40 ml of Iodine contrast agent (300 mg/ml) at 4 ml/s speed.
R
Stanford_car Dataset
universe.roboflow.com
zip
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Openglpro (2024). Stanford_car Dataset [Dataset]. https://universe.roboflow.com/openglpro/stanford_car/model/3
Explore at:
zipAvailable download formats
Dataset updated
Aug 1, 2024
Dataset authored and provided by
Openglpro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Labeled All The Cars Bounding Boxes
Description
This dataset is a copy of a subset of the full Stanford Cars dataset

Stanford Cars Dataset, Stanford AI

Stanford Cars Dataset, Papers with Code

The original dataset contained 16,185 images of 196 classes of cars.

The classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe in the original dataset, and in this subset of the full dataset (v3, TestData and v4, original_raw-images).

v4 (original_raw-images) contains a generated version of the original, raw images, without any modified classes

v8 (classes-Modified_raw-images) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes: 1. bike, moped --remapped to--> motorbike 2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle 3. rickshaw, boat, bicycle --> omitted

v9 (FAST-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes: 1. bike, moped --remapped to--> motorbike 2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle 3. rickshaw, boat, bicycle --> omitted

v10 (ACCURATE-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes: 1. bike, moped --remapped to--> motorbike 2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle 3. rickshaw, boat, bicycle --> omitted

Citation:

3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013. pdf BibTex slides
DEEP-VOICE: DeepFake Voice Recognition
kaggle.com
Updated Aug 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan J. Bird (2023). DEEP-VOICE: DeepFake Voice Recognition [Dataset]. https://www.kaggle.com/datasets/birdy654/deep-voice-deepfake-voice-recognition
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jordan J. Bird
Description
DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

This dataset contains examples of real human speech, and DeepFake versions of those speeches by using Retrieval-based Voice Conversion.

Can machine learning be used to detect when speech is AI-generated?

Introduction

There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technology poses a significant ethical threat and could lead to breaches of privacy and misrepresentation, thus there is an urgent need for real-time detection of AI-generated speech for DeepFake Voice Conversion.

To address the above emerging issues, we are introducing the DEEP-VOICE dataset. DEEP-VOICE is comprised of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion.

For each speech, the accompaniment ("background noise") was removed before conversion using RVC. The original accompaniment is then added back to the DeepFake speech:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F921dc2241837cd784329955d570f7802%2Fdfcover.png?generation=1692897655324630&alt=media" alt="Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.">

(Above: Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.)

Dataset

There are two forms to the dataset that are made available.

First, the raw audio can be found in the "AUDIO" directory. They are arranged within "REAL" and "FAKE" class directories. The audio filenames note which speakers provided the real speech, and which voices they were converted to. For example "Obama-to-Biden" denotes that Barack Obama's speech has been converted to Joe Biden's voice.

Second, the extracted features can be found in the "DATASET-balanced.csv" file. This is the data that was used in the below study. The dataset has each feature extracted from one-second windows of audio and are balanced through random sampling.

**Note: ** All experimental data is found within the "KAGGLE" directory. The "DEMONSTRATION" directory is used for playing cropped and compressed demos in notebooks due to Kaggle's limitations on file size.

A potential use of a successful system could be used for the following:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F7ae536243464f0dbb48f3566765f6b50%2Fdfcover.png?generation=1692897790677119&alt=media" alt="Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.">

(Above: Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.)

Papers with Code

The dataset and all studies using it are linked using Papers with Code

The Papers with Code page can be found by clicking here: Papers with Code

Attribution

This dataset was produced from the study "Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion"

Bird, J.J. and Lotfi, A., 2023. Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion. arXiv preprint arXiv:2308.12734.

The preprint can be found on ArXiv by clicking here: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

License

This dataset is provided under the MIT License:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

*THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT H...
h
GEMBench
huggingface.co
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo Garcia-Pinel (2025). GEMBench [Dataset]. https://huggingface.co/datasets/rjgpinel/GEMBench
Explore at:
Dataset updated
Feb 10, 2025
Authors
Ricardo Garcia-Pinel
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for GEMBench dataset

💎 GEneralizable vision-language robotic Manipulation Benchmark Dataset A benchmark to systematically evaluate generalization capabilities of vision-and-language robotic manipulation policies. Built upon the RLBench simulator.

💻 GEMBench Project Webpage: https://www.di.ens.fr/willow/research/gembench/ 📈 Leaderboard: https://paperswithcode.com/sota/robot-manipulation-generalization-on-gembench

Dataset Structure

Dataset structure is as… See the full description on the dataset page: https://huggingface.co/datasets/rjgpinel/GEMBench.
R
Data from: Tree Trunks Dataset
universe.roboflow.com
zip
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Carlsson (2025). Tree Trunks Dataset [Dataset]. https://universe.roboflow.com/jacob-carlsson-ruecl/tree-trunks/model/4
Explore at:
zipAvailable download formats
Dataset updated
Mar 4, 2025
Dataset authored and provided by
Jacob Carlsson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Trees Bounding Boxes
Description
Dataset to classify trees based on a image of its trunk.

Images are from swedish trees, spread ot during the year.

Currently swedish label names for trees (with no åöä).

Collection

Captured with an iPhone12 mini and converted from heic file format to jpg.

First one or more images for context, e.g. things from the tree except the trunk. This could be flowers, leafs, ... or some kind of hand gesture that I know.

Then images of one or more trees with the same class.

Previous papers:

https://paperswithcode.com/dataset/barknet-1-0
Multi30k_train-ca
zenodo.org
txt
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). Multi30k_train-ca [Dataset]. http://doi.org/10.5281/zenodo.10728674
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10728674
Dataset updated
Feb 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
Multi30k_train-ca dataset is a professional translation of the train.en.multi30k dataset into Catalan, commissioned by BSC LangTech Unit.

The Flickr30k is a dataset for sentence-based image description. It includes 31,000 images collected from Flickr, together with 5 reference captions provided by human annotators (https://paperswithcode.com/dataset/flickr30k).

This work was funded by the Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya within the framework of Projecte AINA.
MEDQA-USMLE QA JSON Only
kaggle.com
Updated Oct 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nithin Dhananjayan (2023). MEDQA-USMLE QA JSON Only [Dataset]. https://www.kaggle.com/datasets/evidence/medqa-usmle-qa-json-only
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nithin Dhananjayan
Description
The current dataset is a subset and reformatting of a more raw dataset. The focus here is only on US questions and answers split into dev, train, and test sets in separate json files. This format ought to be easier to use. This notebook captures how the conversion was done.

The more raw dataset is pulled from paperswithcode which was originally pulled from A Large-scale Open Domain Question Answering Dataset from Medical Exams

The dataset is collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively.

This is under the MIT License

MIT License (As given on github)

Copyright (c) 2022 Di Jin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Written with StackEdit.
R
Thai Food Detection Dataset
universe.roboflow.com
zip
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thesis (2025). Thai Food Detection Dataset [Dataset]. https://universe.roboflow.com/thesis-shtjy/thai-food-detection/model/2
Explore at:
zipAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Thesis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Food Bounding Boxes
Description
This project uses data from the THFOOD-50 dataset, originally created by Chakkrit Termritthikun, Paisarn Muneesawang, and Surachet Kanprachar. The dataset was further manually labeled and curated by Nonglak Changnoiaumphai for project-specific classification tasks.

Citations

Please cite the original authors’ work as follows:

@article{termritthikun2017nu, title="{NU-InNet: Thai food image recognition using convolutional neural networks on smartphone}", author={Termritthikun, Chakkrit and Muneesawang, Paisarn and Kanprachar, Surachet}, journal={Journal of Telecommunication, Electronic and Computer Engineering (JTEC)}, volume={9}, number={2-6}, pages={63--67}, year={2017} }

@inproceedings{termritthikun2017accuracy, title="{Accuracy improvement of Thai food image recognition using deep convolutional neural networks}", author={Termritthikun, Chakkrit and Kanprachar, Surachet}, booktitle={2017 international electrical engineering congress (IEECON)}, pages={1--4}, year={2017}, organization={IEEE} }

@article{termritthikun2018nu, title="{Nu-ResNet: Deep residual networks for Thai food image recognition}", author={Termritthikun, Chakkrit and Kanprachar, Surachet}, journal={Journal of Telecommunication, Electronic and Computer Engineering (JTEC)}, volume={10}, number={1-4}, pages={29--33}, year={2018} }
CTtttttttttttttttttttttttttttttttttt
kaggle.com
Updated Jan 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amira bouamrane (2025). CTtttttttttttttttttttttttttttttttttt [Dataset]. https://www.kaggle.com/datasets/amirabouamrane/xdna-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 26, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
amira bouamrane
Description
This dataset represents a collection of several CT scan datasets for lung cancer diagnosis. It contains only images of benign or malignant nodules from various sources, aimed at developing lung cancer CADx systems: 1. LIDC-IDRI dataset at https://paperswithcode.com/dataset/lidc-idri . 2. CT Scan Images for Lung Cancerat: https://www.kaggle.com/datasets/dishantrathi20/ct-scan-images-for-lung-cancer. 3. Lung Cancer Dataset at: https://www.kaggle.com/datasets/jayaprakashpondy/lung-cancer-dataset. 4. Chest CT-Scan images Dataset at: https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images. 5. DLCTLUNGDetectNet-Lung Tumor Dataset at: https://www.kaggle.com/datasets/harshaldharpure/dlctlungdetectnet- lung-tumor-dataset. 6. IQ-OTH/NCCD dataset at: https://www.kaggle.com/datasets/hamdallak/the-iqothnccd-lung-cancer-dataset.
h
in-the-groove
huggingface.co
Updated Oct 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Vaillant (2023). in-the-groove [Dataset]. https://huggingface.co/datasets/origami-digital/in-the-groove
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 24, 2023
Authors
David Vaillant
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Compiled from several different sets of songs:

(ITG) In the Groove (ITG) In the Groove 2

Songs were downloaded from https://search.stepmaniaonline.net/packs/in+the+groove and are stored here for persistence. In The Groove/ITG typically refers to DDR beatmaps done with an eye towards pad play. Dataset info: https://paperswithcode.com/dataset/itg
h
wildfires-cems
huggingface.co
Updated Aug 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LINKS - AI, Data & Space (2023). wildfires-cems [Dataset]. http://doi.org/10.57967/hf/2047
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/2047
Dataset updated
Aug 11, 2023
Dataset authored and provided by
LINKS - AI, Data & Space
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wildfires - CEMS

The dataset includes annotations for burned area delineation and land cover segmentation, with a focus on European soil. The dataset is curated from various sources, including the Copernicus European Monitoring System (EMS) and Sentinel-2 feeds.

Repository: https://github.com/links-ads/burned-area-seg Paper: https://paperswithcode.com/paper/robust-burned-area-delineation-through

Dataset Preparation

The dataset has been compressed into segmentented… See the full description on the dataset page: https://huggingface.co/datasets/links-ads/wildfires-cems.
R
Data from: Signature Detection Dataset
universe.roboflow.com
zip
Updated Jan 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tech (2025). Signature Detection Dataset [Dataset]. https://universe.roboflow.com/tech-ysdkk/signature-detection-hlx8j/model/1
Explore at:
zipAvailable download formats
Dataset updated
Jan 18, 2025
Dataset authored and provided by
tech
Variables measured
Signature Bounding Boxes
Description
Este conjunto de dados combina o conjunto de dados Tobacco800 e o signatures-xc8up para criar uma coleção abrangente para treinar modelos de detecção de assinaturas. Ele contém imagens anotadas para detecção de assinaturas manuscritas em vários tipos de documentos.

Componentes do Dataset

Tobacco800:

Subconjunto da Coleção de Teste de Processamento de Imagens de Documentos Complexos (CDIP).

Contém imagens digitalizadas de documentos relacionados à indústria do tabaco, criadas pelo Instituto de Tecnologia de Illinois.

signatures-xc8up:

Parte do Roboflow 100, uma iniciativa da Intel.

Inclui 368 imagens anotadas para detecção de assinaturas manuscritas.

Ambos foram unificados para fornecer uma base robusta e diversificada para tarefas de detecção de objetos.

Detalhes do Dataset

Divisão do Dataset:

Treinamento: 1.980 imagens (70%)

Validação: 420 imagens (15%)

Teste: 419 imagens (15%)

Resolução:

Todas as imagens foram redimensionadas para 640x640 pixels.

Formato: COCO JSON

Licança: Apache 2.0

Pré-processamento e Aumentações

Pré-processamento:

Auto-Orientação: Aplicado

Redimensionamento: 640x640 pixels

Aumentações Aplicadas:

Rotação de 90°: Sentido horário, anti-horário e de cabeça para baixo

Rotação: Entre -10° e +10°

Cisalhamento: ±4° Horizontal, ±3° Vertical

Brilho: Entre -8% e +8%

Exposição: Entre -13% e +13%

Desfoque: Até 1,1 pixels

Ruído: Até 0,97% dos pixels

Estas etapas foram implementadas para aumentar a robustez do modelo e sua capacidade de generalização.
Handwritten Signature Datasets
kaggle.com
zip
Updated Jun 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ishani Kathuria (2021). Handwritten Signature Datasets [Dataset]. https://www.kaggle.com/ishanikathuria/handwritten-signature-datasets
Explore at:
zip(304495326 bytes)Available download formats
Dataset updated
Jun 22, 2021
Authors
Ishani Kathuria
Description
Context

Three benchmark datasets containing offline handwritten genuine and forged signatures.

Content

CEDAR

This dataset contains signatures by 55 people written in Latin script. Each person has 24 genuine and 24 forged signatures.

BHSig260 Bengali

This dataset contains signatures by 100 people written in Bengali script. Each person has 24 genuine and 30 forged signatures.

BHSig260 Hindi

This dataset contains signatures by 160 people written in Hindi script. Each person has 24 genuine and 30 forged signatures.

The file structure is as follows: DATASET |---1 |---|---images |---2 |---|---images . . .

Acknowledgements

CEDAR

BhSig260
f
Papers with Code | Computers Electronics & Technology Data | Technology Data...
datastore.forage.ai
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Papers with Code | Computers Electronics & Technology Data | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=Artificial%20Intelligence%20(AI)%20and%20Machine%20Learning%20(ML)
Explore at:
Dataset updated
Sep 22, 2024
Description
Papers with Code is an organization that aggregates and indexes academic papers in the field of computer science, with a focus on artificial intelligence, machine learning, and data science. The organization provides a platform for researchers and developers to access and engage with cutting-edge research, including technical reports, pre-print papers, and datasets. Papers with Code curates and annotates papers with metadata, including categories, keywords, and task types, making it easier for users to discover and explore relevant research.

The platform aims to democratize access to research by providing a comprehensive and user-friendly interface for searching, filtering, and exploring papers. Papers with Code features a diverse range of papers, including those published in top-tier conferences and journals, as well as pre-print papers and technical reports. The organization's metadata annotation enables researchers to quickly identify relevant papers and explore related research, insights, and methodologies. With Papers with Code, researchers can stay up-to-date with the latest advancements in AI, ML, and data science, and collaborate with others to advance the field.
h
conll2012_ontonotesv5
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Speech and Language Technology, DFKI, conll2012_ontonotesv5 [Dataset]. https://huggingface.co/datasets/DFKI-SLT/conll2012_ontonotesv5
Explore at:
Dataset authored and provided by
Speech and Language Technology, DFKI
Description
OntoNotes v5.0 is the final version of OntoNotes corpus, and is a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information.

This dataset is the version of OntoNotes v5.0 extended and is used in the CoNLL-2012 shared task. It includes v4 train/dev and v9 test data for English/Chinese/Arabic and corrected version v12 train/dev/test data (English only).

The source of data is the Mendeley Data repo ontonotes-conll2012, which seems to be as the same as the official data, but users should use this dataset on their own responsibility.

See also summaries from paperwithcode, OntoNotes 5.0 and CoNLL-2012

For more detailed info of the dataset like annotation, tag set, etc., you can refer to the documents in the Mendeley repo mentioned above.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jonas Wilinski (2023). paperswithcode [Dataset]. https://huggingface.co/datasets/J0nasW/paperswithcode

paperswithcode

J0nasW/paperswithcode

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 15, 2023

Authors

Jonas Wilinski

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

A cleaned dataset from paperswithcode.com

Last dataset update: July 2023 This is a cleaned up dataset optained from paperswithcode.com through their API service. It represents a set of around 56K carefully categorized papers into 3K tasks and 16 areas. The papers contain arXiv and NIPS IDs as well as title, abstract and other meta information. It can be used for training text classifiers that concentrate on the use of specific AI and ML methods and frameworks.

  Contents… See the full description on the dataset page: https://huggingface.co/datasets/J0nasW/paperswithcode.

Clear search

Close search

Google apps

Main menu

paperswithcode

Machine learning techniques with code

CIFAKE: Real and AI-Generated Synthetic Images

CIFAKE: Real and AI-Generated Synthetic Images

Dataset details

Papers with Code

References

Notes

License

WikiText-103 Dataset

deepnets1m

UniToBrain Dataset

Stanford_car Dataset

This dataset is a copy of a subset of the full Stanford Cars dataset

Citation:

DEEP-VOICE: DeepFake Voice Recognition

DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

Introduction

Dataset

Papers with Code

Attribution

License

GEMBench

Data from: Tree Trunks Dataset

Collection

Previous papers:

Multi30k_train-ca

MEDQA-USMLE QA JSON Only

Thai Food Detection Dataset

Citations

CTtttttttttttttttttttttttttttttttttt

in-the-groove

wildfires-cems

Data from: Signature Detection Dataset

Componentes do Dataset

Detalhes do Dataset

Pré-processamento e Aumentações

Handwritten Signature Datasets

Context

Content

CEDAR

BHSig260 Bengali

BHSig260 Hindi

Acknowledgements

Papers with Code | Computers Electronics & Technology Data | Technology Data...

conll2012_ontonotesv5

paperswithcode

J0nasW/paperswithcode