44 datasets found
  1. h

    paperswithcode

    • huggingface.co
    Updated Jul 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Wilinski (2023). paperswithcode [Dataset]. https://huggingface.co/datasets/J0nasW/paperswithcode
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2023
    Authors
    Jonas Wilinski
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A cleaned dataset from paperswithcode.com

    Last dataset update: July 2023 This is a cleaned up dataset optained from paperswithcode.com through their API service. It represents a set of around 56K carefully categorized papers into 3K tasks and 16 areas. The papers contain arXiv and NIPS IDs as well as title, abstract and other meta information. It can be used for training text classifiers that concentrate on the use of specific AI and ML methods and frameworks.

      Contents… See the full description on the dataset page: https://huggingface.co/datasets/J0nasW/paperswithcode.
    
  2. Machine learning techniques with code

    • zenodo.org
    json
    Updated Jul 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paperswithcode; Paperswithcode (2022). Machine learning techniques with code [Dataset]. http://doi.org/10.5281/zenodo.6788250
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 2, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Paperswithcode; Paperswithcode
    License

    Attribution-ShareAlike 1.0 (CC BY-SA 1.0)https://creativecommons.org/licenses/by-sa/1.0/
    License information was derived automatically

    Description

    This dataset contains data from Paperswithcode.com obtained on January of 2022. The first file, 'Papers_with_abstracts' includes information about different research papers like their title, abstract, authors, etc. 'Links_between_papers_and_code' includes each paper connection to their corresponding github repository. Finally, 'Methods' includes a categorization of the aforementioned papers done by the community of Paperswithcode in different areas. These files were used in the Master Thesis Topic Modeling for Research Software done by María Ayuso in Universidad Politécnica de Madrid.

  3. CIFAKE: Real and AI-Generated Synthetic Images

    • kaggle.com
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan J. Bird (2023). CIFAKE: Real and AI-Generated Synthetic Images [Dataset]. https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jordan J. Bird
    Description

    CIFAKE: Real and AI-Generated Synthetic Images

    The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.

    CIFAKE is a dataset that contains 60,000 synthetically-generated images and 60,000 real images (collected from CIFAR-10). Can computer vision techniques be used to detect when an image is real or has been generated by AI?

    Further information on this dataset can be found here: Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

    Dataset details

    The dataset contains two classes - REAL and FAKE.

    For REAL, we collected the images from Krizhevsky & Hinton's CIFAR-10 dataset

    For the FAKE images, we generated the equivalent of CIFAR-10 with Stable Diffusion version 1.4

    There are 100,000 images for training (50k per class) and 20,000 for testing (10k per class)

    Papers with Code

    The dataset and all studies using it are linked using Papers with Code https://paperswithcode.com/dataset/cifake-real-and-ai-generated-synthetic-images

    References

    If you use this dataset, you must cite the following sources

    Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

    Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

    Real images are from Krizhevsky & Hinton (2009), fake images are from Bird & Lotfi (2024). The Bird & Lotfi study is available here.

    Notes

    The updates to the dataset on the 28th of March 2023 did not change anything; the file formats ".jpeg" were renamed ".jpg" and the root folder was uploaded to meet Kaggle's usability requirements.

    License

    This dataset is published under the same MIT license as CIFAR-10:

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  4. P

    WikiText-103 Dataset

    • paperswithcode.com
    • opendatalab.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher, WikiText-103 Dataset [Dataset]. https://paperswithcode.com/dataset/wikitext-103
    Explore at:
    Authors
    Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher
    Description

    The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

    Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers - all of which are removed in PTB. As it is composed of full articles, the dataset is well suited for models that can take advantage of long term dependencies.

  5. h

    deepnets1m

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samsung SAIT AI Lab, Montreal (2024). deepnets1m [Dataset]. https://huggingface.co/datasets/SamsungSAILMontreal/deepnets1m
    Explore at:
    Dataset updated
    Apr 6, 2024
    Dataset authored and provided by
    Samsung SAIT AI Lab, Montreal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a copy of the DeepNets-1M dataset originally released at https://github.com/facebookresearch/ppuda under the MIT license. The dataset presents diverse computational graphs (1M training and 1402 evaluation) of neural network architectures used in image classification. See detailed description at https://paperswithcode.com/dataset/deepnets-1m and in the Parameter Prediction for Unseen Deep Architectures paper. There are four files in this dataset:

    deepnets1m_eval.hdf5; # 16 MB (md5:… See the full description on the dataset page: https://huggingface.co/datasets/SamsungSAILMontreal/deepnets1m.

  6. UniToBrain Dataset

    • zenodo.org
    • ieee-dataport.org
    • +2more
    bin, csv, pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umberto Gava; Umberto Gava; Federico D'Agata; Federico D'Agata; Edwin Bennink; Edwin Bennink; Enzo Tartaglione; Enzo Tartaglione; Annamaria Vernone; Annamaria Vernone; Francesca Bertolino; Eleonora Ficiarà; Eleonora Ficiarà; Alessandro Cicerale; Alessandro Cicerale; Fabrizio Pizzagalli; Fabrizio Pizzagalli; Caterina Guiot; Caterina Guiot; Marco Grangetto; Marco Grangetto; Mauro Bergui; Mauro Bergui; Francesca Bertolino (2024). UniToBrain Dataset [Dataset]. http://doi.org/10.5281/zenodo.4817605
    Explore at:
    pdf, csv, binAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Umberto Gava; Umberto Gava; Federico D'Agata; Federico D'Agata; Edwin Bennink; Edwin Bennink; Enzo Tartaglione; Enzo Tartaglione; Annamaria Vernone; Annamaria Vernone; Francesca Bertolino; Eleonora Ficiarà; Eleonora Ficiarà; Alessandro Cicerale; Alessandro Cicerale; Fabrizio Pizzagalli; Fabrizio Pizzagalli; Caterina Guiot; Caterina Guiot; Marco Grangetto; Marco Grangetto; Mauro Bergui; Mauro Bergui; Francesca Bertolino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The University of Turin (UniTO) released the open-access dataset UniTOBrain collected for the homonymous Use Case 3 in the DeepHealth project (https://deephealth-project.eu/). UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP). The dataset includes 100 training subjects and 15 testing subjects used in a submitted publication for the training and the testing of a Convolutional Neural Network (CNN, see for details: https://arxiv.org/abs/2101.05992, https://paperswithcode.com/paper/neural-network-derived-perfusion-maps-a-model, https://www.medrxiv.org/content/10.1101/2021.01.13.21249757v1). At this stage, the UniTO team released this dataset privately, but soon it will be public. This is a subsample of a greater dataset of 258 subjects that will be soon available for download at https://ieee-dataport.org/.
    CTP data from 258 consecutive patients were retrospectively obtained from the hospital PACS of Città della Salute e della Scienza di Torino (Molinette). CTP acquisition parameters were as follows: Scanner GE, 64 slices, 80 kV, 150 mAs, 44.5 sec duration, 89 volumes (40 mm axial coverage), injection of 40 ml of Iodine contrast agent (300 mg/ml) at 4 ml/s speed.

  7. R

    Stanford_car Dataset

    • universe.roboflow.com
    zip
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Openglpro (2024). Stanford_car Dataset [Dataset]. https://universe.roboflow.com/openglpro/stanford_car/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset authored and provided by
    Openglpro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Labeled All The Cars Bounding Boxes
    Description

    This dataset is a copy of a subset of the full Stanford Cars dataset

    The original dataset contained 16,185 images of 196 classes of cars.

    The classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe in the original dataset, and in this subset of the full dataset (v3, TestData and v4, original_raw-images).

    v4 (original_raw-images) contains a generated version of the original, raw images, without any modified classes

    v8 (classes-Modified_raw-images) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes: 1. bike, moped --remapped to--> motorbike 2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle 3. rickshaw, boat, bicycle --> omitted

    v9 (FAST-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes: 1. bike, moped --remapped to--> motorbike 2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle 3. rickshaw, boat, bicycle --> omitted

    v10 (ACCURATE-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes: 1. bike, moped --remapped to--> motorbike 2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle 3. rickshaw, boat, bicycle --> omitted

    Citation:

    3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013. pdf BibTex slides

  8. DEEP-VOICE: DeepFake Voice Recognition

    • kaggle.com
    Updated Aug 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan J. Bird (2023). DEEP-VOICE: DeepFake Voice Recognition [Dataset]. https://www.kaggle.com/datasets/birdy654/deep-voice-deepfake-voice-recognition
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jordan J. Bird
    Description

    DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

    This dataset contains examples of real human speech, and DeepFake versions of those speeches by using Retrieval-based Voice Conversion.

    Can machine learning be used to detect when speech is AI-generated?

    Introduction

    There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technology poses a significant ethical threat and could lead to breaches of privacy and misrepresentation, thus there is an urgent need for real-time detection of AI-generated speech for DeepFake Voice Conversion.

    To address the above emerging issues, we are introducing the DEEP-VOICE dataset. DEEP-VOICE is comprised of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion.

    For each speech, the accompaniment ("background noise") was removed before conversion using RVC. The original accompaniment is then added back to the DeepFake speech:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F921dc2241837cd784329955d570f7802%2Fdfcover.png?generation=1692897655324630&alt=media" alt="Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.">

    (Above: Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.)

    Dataset

    There are two forms to the dataset that are made available.

    First, the raw audio can be found in the "AUDIO" directory. They are arranged within "REAL" and "FAKE" class directories. The audio filenames note which speakers provided the real speech, and which voices they were converted to. For example "Obama-to-Biden" denotes that Barack Obama's speech has been converted to Joe Biden's voice.

    Second, the extracted features can be found in the "DATASET-balanced.csv" file. This is the data that was used in the below study. The dataset has each feature extracted from one-second windows of audio and are balanced through random sampling.

    **Note: ** All experimental data is found within the "KAGGLE" directory. The "DEMONSTRATION" directory is used for playing cropped and compressed demos in notebooks due to Kaggle's limitations on file size.

    A potential use of a successful system could be used for the following:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F7ae536243464f0dbb48f3566765f6b50%2Fdfcover.png?generation=1692897790677119&alt=media" alt="Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.">

    (Above: Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.)

    Papers with Code

    The dataset and all studies using it are linked using Papers with Code

    The Papers with Code page can be found by clicking here: Papers with Code

    Attribution

    This dataset was produced from the study "Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion"

    Bird, J.J. and Lotfi, A., 2023. Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion. arXiv preprint arXiv:2308.12734.

    The preprint can be found on ArXiv by clicking here: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

    License

    This dataset is provided under the MIT License:

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    *THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT H...

  9. h

    GEMBench

    • huggingface.co
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo Garcia-Pinel (2025). GEMBench [Dataset]. https://huggingface.co/datasets/rjgpinel/GEMBench
    Explore at:
    Dataset updated
    Feb 10, 2025
    Authors
    Ricardo Garcia-Pinel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for GEMBench dataset

    💎 GEneralizable vision-language robotic Manipulation Benchmark Dataset A benchmark to systematically evaluate generalization capabilities of vision-and-language robotic manipulation policies. Built upon the RLBench simulator.

    💻 GEMBench Project Webpage: https://www.di.ens.fr/willow/research/gembench/ 📈 Leaderboard: https://paperswithcode.com/sota/robot-manipulation-generalization-on-gembench

      Dataset Structure
    

    Dataset structure is as… See the full description on the dataset page: https://huggingface.co/datasets/rjgpinel/GEMBench.

  10. R

    Data from: Tree Trunks Dataset

    • universe.roboflow.com
    zip
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Carlsson (2025). Tree Trunks Dataset [Dataset]. https://universe.roboflow.com/jacob-carlsson-ruecl/tree-trunks/model/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2025
    Dataset authored and provided by
    Jacob Carlsson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Trees Bounding Boxes
    Description

    Dataset to classify trees based on a image of its trunk.

    Images are from swedish trees, spread ot during the year.

    Currently swedish label names for trees (with no åöä).

    Collection

    Captured with an iPhone12 mini and converted from heic file format to jpg.

    1. First one or more images for context, e.g. things from the tree except the trunk. This could be flowers, leafs, ... or some kind of hand gesture that I know.
    2. Then images of one or more trees with the same class.

    Previous papers:

  11. Multi30k_train-ca

    • zenodo.org
    txt
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). Multi30k_train-ca [Dataset]. http://doi.org/10.5281/zenodo.10728674
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    Multi30k_train-ca dataset is a professional translation of the train.en.multi30k dataset into Catalan, commissioned by BSC LangTech Unit.

    The Flickr30k is a dataset for sentence-based image description. It includes 31,000 images collected from Flickr, together with 5 reference captions provided by human annotators (https://paperswithcode.com/dataset/flickr30k).

    This work was funded by the Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya within the framework of Projecte AINA.

  12. MEDQA-USMLE QA JSON Only

    • kaggle.com
    Updated Oct 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nithin Dhananjayan (2023). MEDQA-USMLE QA JSON Only [Dataset]. https://www.kaggle.com/datasets/evidence/medqa-usmle-qa-json-only
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nithin Dhananjayan
    Description

    The current dataset is a subset and reformatting of a more raw dataset. The focus here is only on US questions and answers split into dev, train, and test sets in separate json files. This format ought to be easier to use. This notebook captures how the conversion was done.

    The more raw dataset is pulled from paperswithcode which was originally pulled from A Large-scale Open Domain Question Answering Dataset from Medical Exams

    The dataset is collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively.

    This is under the MIT License

    MIT License (As given on github)

    Copyright (c) 2022 Di Jin

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

    Written with StackEdit.

  13. R

    Thai Food Detection Dataset

    • universe.roboflow.com
    zip
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thesis (2025). Thai Food Detection Dataset [Dataset]. https://universe.roboflow.com/thesis-shtjy/thai-food-detection/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Thesis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Food Bounding Boxes
    Description

    This project uses data from the THFOOD-50 dataset, originally created by Chakkrit Termritthikun, Paisarn Muneesawang, and Surachet Kanprachar. The dataset was further manually labeled and curated by Nonglak Changnoiaumphai for project-specific classification tasks.

    Citations

    Please cite the original authors’ work as follows:

    @article{termritthikun2017nu, title="{NU-InNet: Thai food image recognition using convolutional neural networks on smartphone}", author={Termritthikun, Chakkrit and Muneesawang, Paisarn and Kanprachar, Surachet}, journal={Journal of Telecommunication, Electronic and Computer Engineering (JTEC)}, volume={9}, number={2-6}, pages={63--67}, year={2017} }

    @inproceedings{termritthikun2017accuracy, title="{Accuracy improvement of Thai food image recognition using deep convolutional neural networks}", author={Termritthikun, Chakkrit and Kanprachar, Surachet}, booktitle={2017 international electrical engineering congress (IEECON)}, pages={1--4}, year={2017}, organization={IEEE} }

    @article{termritthikun2018nu, title="{Nu-ResNet: Deep residual networks for Thai food image recognition}", author={Termritthikun, Chakkrit and Kanprachar, Surachet}, journal={Journal of Telecommunication, Electronic and Computer Engineering (JTEC)}, volume={10}, number={1-4}, pages={29--33}, year={2018} }

  14. CTtttttttttttttttttttttttttttttttttt

    • kaggle.com
    Updated Jan 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amira bouamrane (2025). CTtttttttttttttttttttttttttttttttttt [Dataset]. https://www.kaggle.com/datasets/amirabouamrane/xdna-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    amira bouamrane
    Description

    This dataset represents a collection of several CT scan datasets for lung cancer diagnosis. It contains only images of benign or malignant nodules from various sources, aimed at developing lung cancer CADx systems: 1. LIDC-IDRI dataset at https://paperswithcode.com/dataset/lidc-idri . 2. CT Scan Images for Lung Cancerat: https://www.kaggle.com/datasets/dishantrathi20/ct-scan-images-for-lung-cancer. 3. Lung Cancer Dataset at: https://www.kaggle.com/datasets/jayaprakashpondy/lung-cancer-dataset. 4. Chest CT-Scan images Dataset at: https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images. 5. DLCTLUNGDetectNet-Lung Tumor Dataset at: https://www.kaggle.com/datasets/harshaldharpure/dlctlungdetectnet- lung-tumor-dataset. 6. IQ-OTH/NCCD dataset at: https://www.kaggle.com/datasets/hamdallak/the-iqothnccd-lung-cancer-dataset.

  15. h

    in-the-groove

    • huggingface.co
    Updated Oct 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Vaillant (2023). in-the-groove [Dataset]. https://huggingface.co/datasets/origami-digital/in-the-groove
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 24, 2023
    Authors
    David Vaillant
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Compiled from several different sets of songs:

    (ITG) In the Groove (ITG) In the Groove 2

    Songs were downloaded from https://search.stepmaniaonline.net/packs/in+the+groove and are stored here for persistence. In The Groove/ITG typically refers to DDR beatmaps done with an eye towards pad play. Dataset info: https://paperswithcode.com/dataset/itg

  16. h

    wildfires-cems

    • huggingface.co
    Updated Aug 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LINKS - AI, Data & Space (2023). wildfires-cems [Dataset]. http://doi.org/10.57967/hf/2047
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2023
    Dataset authored and provided by
    LINKS - AI, Data & Space
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Wildfires - CEMS

    The dataset includes annotations for burned area delineation and land cover segmentation, with a focus on European soil. The dataset is curated from various sources, including the Copernicus European Monitoring System (EMS) and Sentinel-2 feeds.

    Repository: https://github.com/links-ads/burned-area-seg Paper: https://paperswithcode.com/paper/robust-burned-area-delineation-through

      Dataset Preparation
    

    The dataset has been compressed into segmentented… See the full description on the dataset page: https://huggingface.co/datasets/links-ads/wildfires-cems.

  17. R

    Data from: Signature Detection Dataset

    • universe.roboflow.com
    zip
    Updated Jan 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tech (2025). Signature Detection Dataset [Dataset]. https://universe.roboflow.com/tech-ysdkk/signature-detection-hlx8j/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 18, 2025
    Dataset authored and provided by
    tech
    Variables measured
    Signature Bounding Boxes
    Description

    Este conjunto de dados combina o conjunto de dados Tobacco800 e o signatures-xc8up para criar uma coleção abrangente para treinar modelos de detecção de assinaturas. Ele contém imagens anotadas para detecção de assinaturas manuscritas em vários tipos de documentos.

    Componentes do Dataset

    1. Tobacco800:

      • Subconjunto da Coleção de Teste de Processamento de Imagens de Documentos Complexos (CDIP).
      • Contém imagens digitalizadas de documentos relacionados à indústria do tabaco, criadas pelo Instituto de Tecnologia de Illinois.
    2. signatures-xc8up:

      • Parte do Roboflow 100, uma iniciativa da Intel.
      • Inclui 368 imagens anotadas para detecção de assinaturas manuscritas.

    Ambos foram unificados para fornecer uma base robusta e diversificada para tarefas de detecção de objetos.

    Detalhes do Dataset

    • Divisão do Dataset:

      • Treinamento: 1.980 imagens (70%)
      • Validação: 420 imagens (15%)
      • Teste: 419 imagens (15%)
    • Resolução:

      • Todas as imagens foram redimensionadas para 640x640 pixels.
    • Formato: COCO JSON

    • Licança: Apache 2.0

    Pré-processamento e Aumentações

    • Pré-processamento:

      • Auto-Orientação: Aplicado
      • Redimensionamento: 640x640 pixels
    • Aumentações Aplicadas:

      • Rotação de 90°: Sentido horário, anti-horário e de cabeça para baixo
      • Rotação: Entre -10° e +10°
      • Cisalhamento: ±4° Horizontal, ±3° Vertical
      • Brilho: Entre -8% e +8%
      • Exposição: Entre -13% e +13%
      • Desfoque: Até 1,1 pixels
      • Ruído: Até 0,97% dos pixels

    Estas etapas foram implementadas para aumentar a robustez do modelo e sua capacidade de generalização.

  18. Handwritten Signature Datasets

    • kaggle.com
    zip
    Updated Jun 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ishani Kathuria (2021). Handwritten Signature Datasets [Dataset]. https://www.kaggle.com/ishanikathuria/handwritten-signature-datasets
    Explore at:
    zip(304495326 bytes)Available download formats
    Dataset updated
    Jun 22, 2021
    Authors
    Ishani Kathuria
    Description

    Context

    Three benchmark datasets containing offline handwritten genuine and forged signatures.

    Content

    CEDAR

    This dataset contains signatures by 55 people written in Latin script. Each person has 24 genuine and 24 forged signatures.

    BHSig260 Bengali

    This dataset contains signatures by 100 people written in Bengali script. Each person has 24 genuine and 30 forged signatures.

    BHSig260 Hindi

    This dataset contains signatures by 160 people written in Hindi script. Each person has 24 genuine and 30 forged signatures.

    The file structure is as follows: DATASET |---1 |---|---images |---2 |---|---images . . .

    Acknowledgements

    CEDAR

    BhSig260

  19. f

    Papers with Code | Computers Electronics & Technology Data | Technology Data...

    • datastore.forage.ai
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Papers with Code | Computers Electronics & Technology Data | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=Artificial%20Intelligence%20(AI)%20and%20Machine%20Learning%20(ML)
    Explore at:
    Dataset updated
    Sep 22, 2024
    Description

    Papers with Code is an organization that aggregates and indexes academic papers in the field of computer science, with a focus on artificial intelligence, machine learning, and data science. The organization provides a platform for researchers and developers to access and engage with cutting-edge research, including technical reports, pre-print papers, and datasets. Papers with Code curates and annotates papers with metadata, including categories, keywords, and task types, making it easier for users to discover and explore relevant research.

    The platform aims to democratize access to research by providing a comprehensive and user-friendly interface for searching, filtering, and exploring papers. Papers with Code features a diverse range of papers, including those published in top-tier conferences and journals, as well as pre-print papers and technical reports. The organization's metadata annotation enables researchers to quickly identify relevant papers and explore related research, insights, and methodologies. With Papers with Code, researchers can stay up-to-date with the latest advancements in AI, ML, and data science, and collaborate with others to advance the field.

  20. h

    conll2012_ontonotesv5

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Speech and Language Technology, DFKI, conll2012_ontonotesv5 [Dataset]. https://huggingface.co/datasets/DFKI-SLT/conll2012_ontonotesv5
    Explore at:
    Dataset authored and provided by
    Speech and Language Technology, DFKI
    Description

    OntoNotes v5.0 is the final version of OntoNotes corpus, and is a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information.

    This dataset is the version of OntoNotes v5.0 extended and is used in the CoNLL-2012 shared task. It includes v4 train/dev and v9 test data for English/Chinese/Arabic and corrected version v12 train/dev/test data (English only).

    The source of data is the Mendeley Data repo ontonotes-conll2012, which seems to be as the same as the official data, but users should use this dataset on their own responsibility.

    See also summaries from paperwithcode, OntoNotes 5.0 and CoNLL-2012

    For more detailed info of the dataset like annotation, tag set, etc., you can refer to the documents in the Mendeley repo mentioned above.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jonas Wilinski (2023). paperswithcode [Dataset]. https://huggingface.co/datasets/J0nasW/paperswithcode

paperswithcode

J0nasW/paperswithcode

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2023
Authors
Jonas Wilinski
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

A cleaned dataset from paperswithcode.com

Last dataset update: July 2023 This is a cleaned up dataset optained from paperswithcode.com through their API service. It represents a set of around 56K carefully categorized papers into 3K tasks and 16 areas. The papers contain arXiv and NIPS IDs as well as title, abstract and other meta information. It can be used for training text classifiers that concentrate on the use of specific AI and ML methods and frameworks.

  Contents… See the full description on the dataset page: https://huggingface.co/datasets/J0nasW/paperswithcode.
Search
Clear search
Close search
Google apps
Main menu