100+ datasets found

Data from: Coleção de Nematoda do Museu Nacional - UFRJ
portal.obis.org
gbif.org
+1more
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universidade Federal do Rio de Janeiro (2023). Coleção de Nematoda do Museu Nacional - UFRJ [Dataset]. https://portal.obis.org/dataset/13df6cd5-b38e-46b3-96c3-f86f8c7d378e
Explore at:
zipAvailable download formats
Dataset updated
Dec 11, 2023
Dataset provided by
Federal University of Rio de Janeirohttps://ufrj.br/
Description
Coleção de Nematoda do Museu Nacional - UFRJ
Pfkit Dataset
universe.roboflow.com
zip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ibm (2025). Pfkit Dataset [Dataset]. https://universe.roboflow.com/ibm-pdnwf/pfkit/model/1
Explore at:
zipAvailable download formats
Dataset updated
Jul 3, 2025
Dataset provided by
IBMhttp://ibm.com/
Authors
ibm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Ppe Fire Fire And Smoke Bounding Boxes
Description
Pfkit

## Overview Pfkit is a dataset for object detection tasks - it contains Ppe Fire Fire And Smoke annotations for 8,828 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
avatar
huggingface.co
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research - University of Illinois Urbana Champaign Discovery Accelerator Institute (2024). avatar [Dataset]. https://huggingface.co/datasets/iidai/avatar
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2024
Dataset provided by
IBMhttp://ibm.com/
Authors
IBM Research - University of Illinois Urbana Champaign Discovery Accelerator Institute
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
iidai/avatar dataset hosted on Hugging Face and contributed by the HF Datasets community
otter_uniprot_bindingdb_chembl
huggingface.co
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research (2023). otter_uniprot_bindingdb_chembl [Dataset]. https://huggingface.co/datasets/ibm-research/otter_uniprot_bindingdb_chembl
Explore at:
Dataset updated
Oct 18, 2023
Dataset provided by
IBM Research
IBMhttp://ibm.com/
Authors
IBM Research
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Otter UBC Dataset Card

UBC is a dataset comprising entities (Proteins/Drugs) from Uniprot (U), BindingDB (B) and. ChemBL (C). It contains 6,207,654 triples.

Dataset details Uniprot

Uniprot comprises of 573,227 proteins from SwissProt, which is the subset of manually curated entries within UniProt, including attributes with different modalities like the sequence (567,483 of them), full name, organism, protein family, description of its function, catalytics… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/otter_uniprot_bindingdb_chembl.
Data from: Coleção de Brachiopoda do Museu Nacional - UFRJ
portal.obis.org
gbif.org
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universidade Federal do Rio de Janeiro (2023). Coleção de Brachiopoda do Museu Nacional - UFRJ [Dataset]. https://portal.obis.org/dataset/c67e6d09-ee1d-4da4-9b21-bfaf558d60f1
Explore at:
zipAvailable download formats
Dataset updated
Dec 11, 2023
Dataset provided by
Federal University of Rio de Janeirohttps://ufrj.br/
Description
Coleção de Brachiopoda do Museu Nacional - UFRJ
DBPedia
processor1.francecentral.cloudapp.azure.com
ckan.govdata.de
+3more
Updated Dec 12, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DBpedia.org (2016). DBPedia [Dataset]. http://processor1.francecentral.cloudapp.azure.com/pl/dataset/dbpedia
Explore at:
http://publications.europa.eu/resource/authority/file-type/htmlAvailable download formats
Dataset updated
Dec 12, 2016
Dataset provided by
DBpediahttp://dbpedia.org/
License
http://dcat-ap.de/def/licenses/cc-by-sahttp://dcat-ap.de/def/licenses/cc-by-sa
Description
DBpedia is a joint project of Leipzig University, Freie Universität Berlin and OpenLink Software to extract structured information from Wikipedia and make it accessible as linked data web applications. DBpedia also makes it possible to link this data with information from other web applications. The data sets are available under the GNU Free Documentation License and are linked to other free data collections.
Modified bAbI dialog task
gitee.com
github.com
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM (2024). Modified bAbI dialog task [Dataset]. https://gitee.com/mirrors_ibm/modified-bAbI-dialog-tasks?skip_mobile=true
Explore at:
Dataset updated
Dec 31, 2024
Dataset provided by
IBMhttp://ibm.com/
Description
The dataset modified-bAbI dialog tasks is an extension of original-bAbI-dialog-tasks, as described in the paper: "Learning End-to-End Goal-Oriented Dialog with maximal User task success and minimal Human Agent use". We modify the original-bAbI dialog tasks, by removing and replacing certain user behaviors from the training and validation data. The test set is left untouched. This simulates a scenario where some new user behaviors arise during the test (deployment) time that were not seen during the training and hence allows us to test our proposed method. This also mimics real-world data collection via crowdsourcing in the sense that certain user behavior is missing from the training data.
Pfkit 2 Dataset
universe.roboflow.com
zip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ibm (2025). Pfkit 2 Dataset [Dataset]. https://universe.roboflow.com/ibm-pdnwf/pfkit-2
Explore at:
zipAvailable download formats
Dataset updated
Jul 3, 2025
Dataset provided by
IBMhttp://ibm.com/
Authors
ibm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Ppe Kits Ppe Fire Fire And Smoke Bounding Boxes
Description
Pfkit 2

## Overview Pfkit 2 is a dataset for object detection tasks - it contains Ppe Kits Ppe Fire Fire And Smoke annotations for 9,893 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Extended dialog bAbI tasks and CBT-OOV datasets
github.com
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM (2024). Extended dialog bAbI tasks and CBT-OOV datasets [Dataset]. https://github.com/IBM/ne-table-datasets
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
IBMhttp://ibm.com/
Description
Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. The datasets contain extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set.
Naturalistic Variation in Goal-Oriented Dialog datasets
github.com
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM (2024). Naturalistic Variation in Goal-Oriented Dialog datasets [Dataset]. https://github.com/IBM/naturalistic-variation-goal-oriented-dialog-datasets
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
IBMhttp://ibm.com/
Description
The datasets are new and more effective testbeds for bAbI dialog task 5 and Stanford Multi-Domain datasets, which incorporate naturalistic variation by the user. Existing benchmarks used to evaluate the performance of end-to-end neural dialog systems lack a key component: natural variation present in human conversations. Most datasets are constructed through crowdsourcing, where the crowd workers follow a fixed template of instructions while enacting the role of a user/agent. This results in straight-forward, somewhat routine, and mostly trouble-free conversations, as crowd workers do not think to represent the full range of actions that occur naturally with real users. We observe that there is a significant drop in performance (more than 60% in Ent. F1 on SMD and 85% in per-dialog accuracy on bAbI task) of recent state-of-the-art end-to-end neural methods such as BossNet and GLMP on both updated datasets which incorporate naturalistic variation by the user.
identity_group_abuse_robustness
huggingface.co
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research (2024). identity_group_abuse_robustness [Dataset]. https://huggingface.co/datasets/ibm-research/identity_group_abuse_robustness
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2024
Dataset provided by
IBM Research
IBMhttp://ibm.com/
Authors
IBM Research
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for "identity_group_abuse-robustness"

Dataset Summary

identity_group_abuse-robustness is an expanded version of the identity group abuse dataset (https://aclanthology.org/2022.naacl-main.410/) but with perturbations of the original input questions and passages. It is intended for use as a benchmark for evaluating model robustness on question-answering to these perturbations.

Data Instances identity_group_abuse-robustness

Size of… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/identity_group_abuse_robustness.
Data from: Coleção de Sipuncula do Museu Nacional - UFRJ
obis.org
gbif.org
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universidade Federal do Rio de Janeiro (2023). Coleção de Sipuncula do Museu Nacional - UFRJ [Dataset]. https://obis.org/dataset/df41ae5f-7fc7-40c4-967b-e8c0668741c7
Explore at:
zipAvailable download formats
Dataset updated
Dec 11, 2023
Dataset provided by
Federal University of Rio de Janeirohttps://ufrj.br/
Description
Coleção de Sipuncula do Museu Nacional - UFRJ
Twitter Conversations Dataset for Conversational Document Prediction (CDP)...
github.com
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM (2024). Twitter Conversations Dataset for Conversational Document Prediction (CDP) task [Dataset]. https://github.com/IBM/twitter-customer-care-document-prediction
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
IBMhttp://ibm.com/
Description
The dataset contains the Twitter Conversations for the task of Conversational Document Prediction (CDP). The dataset includes conversations that occurred between users and customer care agents in 25 organizations on the Twitter platform. Each conversation ends with a customer care agent providing a URL to a document to resolve the issue the user is facing. The task is to predict the document given a dialog context.
o
Data from: Coleção de Ascidiacea do Museu Nacional - UFRJ
obis.org
gbif.org
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universidade Federal do Rio de Janeiro (2023). Coleção de Ascidiacea do Museu Nacional - UFRJ [Dataset]. https://obis.org/dataset/f1ac4ec4-6133-414a-bc4c-0f215def0cc5
Explore at:
zipAvailable download formats
Dataset updated
Dec 11, 2023
Dataset provided by
Universidade Federal do Rio de Janeiro
Description
Coleção de Ascidiacea do Museu Nacional - UFRJ
Pascal Xml To Yolo Txt Dataset
universe.roboflow.com
zip
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM PubLayNet (2022). Pascal Xml To Yolo Txt Dataset [Dataset]. https://universe.roboflow.com/ibm-publaynet/pascal-xml-to-yolo-txt
Explore at:
zipAvailable download formats
Dataset updated
Nov 7, 2022
Dataset provided by
IBMhttp://ibm.com/
Authors
IBM PubLayNet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Documents Bounding Boxes
Description
Pascal XML To YOLO TXT

## Overview Pascal XML To YOLO TXT is a dataset for object detection tasks - it contains Documents annotations for 8,143 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Permuted bAbI dialog task
github.com
paperswithcode.com
+1more
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM (2024). Permuted bAbI dialog task [Dataset]. https://github.com/IBM/permuted-bAbI-dialog-tasks
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
IBMhttp://ibm.com/
Description
The dataset permuted-bAbI dialog tasks is an extension of original-bAbI-dialog-tasks, as described in the paper: "Learning End-to-End Goal-Oriented Dialog with Multiple Answers". We modify the original-bAbI dialog tasks, by introducing multiple valid next utterances to the original-bAbI dialog tasks, which allows evaluation of end-to-end goal-oriented dialog systems in a more realistic setting.
Road Accident Detection Dataset
universe.roboflow.com
zip
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM (2025). Road Accident Detection Dataset [Dataset]. https://universe.roboflow.com/ibm-oj5gs/road-accident-detection-eioit/model/1
Explore at:
zipAvailable download formats
Dataset updated
May 30, 2025
Dataset authored and provided by
IBMhttp://ibm.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accidental Non_accidnetal Bounding Boxes
Description
Road Accident Detection

## Overview Road Accident Detection is a dataset for object detection tasks - it contains Accidental Non_accidnetal annotations for 3,208 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
BookTest
opendatalab.com
paperswithcode.com
zip
Updated Mar 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Watson (2023). BookTest [Dataset]. https://opendatalab.com/OpenDataLab/BookTest
Explore at:
zipAvailable download formats
Dataset updated
Mar 22, 2023
Dataset provided by
IBMhttp://ibm.com/
Description
BookTest is a new dataset similar to the popular Children’s Book Test (CBT), however more than 60 times larger.
SynthTabNet
opendatalab.com
zip
Updated Mar 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research (2023). SynthTabNet [Dataset]. https://opendatalab.com/OpenDataLab/SynthTabNet
Explore at:
zipAvailable download formats
Dataset updated
Mar 22, 2023
Dataset provided by
IBMhttp://ibm.com/
License
https://cdla.dev/permissive-1-0/https://cdla.dev/permissive-1-0/
Description
SynthTabNet is a dataset of 600k png images from synthetically generated table layouts with annotations in jsonl files.
otter_dude
huggingface.co
Updated Aug 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research (2023). otter_dude [Dataset]. https://huggingface.co/datasets/ibm-research/otter_dude
Explore at:
Dataset updated
Aug 16, 2023
Dataset provided by
IBM Research
IBMhttp://ibm.com/
Authors
IBM Research
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Otter DUDe Dataset Card

Otter DUDe includes 1,452,568 instances of drug-target interactions.

Dataset details DUDe

DUDe comprises a collection of 22,886 active compounds and their corresponding affinities towards 102 targets. For our study, we utilized a preprocessed version of the DUDe, which includes 1,452,568 instances of drug-target interactions. To prevent any data leakage, we eliminated the negative interactions and the overlapping triples with the TDC DTI… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/otter_dude.

Facebook

Twitter

Click to copy link

Link copied

Cite

Universidade Federal do Rio de Janeiro (2023). Coleção de Nematoda do Museu Nacional - UFRJ [Dataset]. https://portal.obis.org/dataset/13df6cd5-b38e-46b3-96c3-f86f8c7d378e

Data from: Coleção de Nematoda do Museu Nacional - UFRJ

Explore at:

zipAvailable download formats

Dataset updated

Dec 11, 2023

Dataset provided by

Federal University of Rio de Janeirohttps://ufrj.br/

Description

Coleção de Nematoda do Museu Nacional - UFRJ

Clear search

Close search

Google apps

Main menu

Data from: Coleção de Nematoda do Museu Nacional - UFRJ

Pfkit Dataset

Pfkit

avatar

otter_uniprot_bindingdb_chembl

Data from: Coleção de Brachiopoda do Museu Nacional - UFRJ

DBPedia

Modified bAbI dialog task

Pfkit 2 Dataset

Pfkit 2

Extended dialog bAbI tasks and CBT-OOV datasets

Naturalistic Variation in Goal-Oriented Dialog datasets

identity_group_abuse_robustness

Data from: Coleção de Sipuncula do Museu Nacional - UFRJ

Twitter Conversations Dataset for Conversational Document Prediction (CDP)...

Data from: Coleção de Ascidiacea do Museu Nacional - UFRJ

Pascal Xml To Yolo Txt Dataset

Pascal XML To YOLO TXT

Permuted bAbI dialog task

Road Accident Detection Dataset

Road Accident Detection

BookTest

SynthTabNet

otter_dude

Data from: Coleção de Nematoda do Museu Nacional - UFRJ