Coleção de Nematoda do Museu Nacional - UFRJ
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Pfkit is a dataset for object detection tasks - it contains Ppe Fire Fire And Smoke annotations for 8,828 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
iidai/avatar dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Otter UBC Dataset Card
UBC is a dataset comprising entities (Proteins/Drugs) from Uniprot (U), BindingDB (B) and. ChemBL (C). It contains 6,207,654 triples.
Dataset details
Uniprot
Uniprot comprises of 573,227 proteins from SwissProt, which is the subset of manually curated entries within UniProt, including attributes with different modalities like the sequence (567,483 of them), full name, organism, protein family, description of its function, catalytics… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/otter_uniprot_bindingdb_chembl.
Coleção de Brachiopoda do Museu Nacional - UFRJ
http://dcat-ap.de/def/licenses/cc-by-sahttp://dcat-ap.de/def/licenses/cc-by-sa
DBpedia is a joint project of Leipzig University, Freie Universität Berlin and OpenLink Software to extract structured information from Wikipedia and make it accessible as linked data web applications. DBpedia also makes it possible to link this data with information from other web applications. The data sets are available under the GNU Free Documentation License and are linked to other free data collections.
The dataset modified-bAbI dialog tasks is an extension of original-bAbI-dialog-tasks, as described in the paper: "Learning End-to-End Goal-Oriented Dialog with maximal User task success and minimal Human Agent use". We modify the original-bAbI dialog tasks, by removing and replacing certain user behaviors from the training and validation data. The test set is left untouched. This simulates a scenario where some new user behaviors arise during the test (deployment) time that were not seen during the training and hence allows us to test our proposed method. This also mimics real-world data collection via crowdsourcing in the sense that certain user behavior is missing from the training data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Pfkit 2 is a dataset for object detection tasks - it contains Ppe Kits Ppe Fire Fire And Smoke annotations for 9,893 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. The datasets contain extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set.
The datasets are new and more effective testbeds for bAbI dialog task 5 and Stanford Multi-Domain datasets, which incorporate naturalistic variation by the user. Existing benchmarks used to evaluate the performance of end-to-end neural dialog systems lack a key component: natural variation present in human conversations. Most datasets are constructed through crowdsourcing, where the crowd workers follow a fixed template of instructions while enacting the role of a user/agent. This results in straight-forward, somewhat routine, and mostly trouble-free conversations, as crowd workers do not think to represent the full range of actions that occur naturally with real users. We observe that there is a significant drop in performance (more than 60% in Ent. F1 on SMD and 85% in per-dialog accuracy on bAbI task) of recent state-of-the-art end-to-end neural methods such as BossNet and GLMP on both updated datasets which incorporate naturalistic variation by the user.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "identity_group_abuse-robustness"
Dataset Summary
identity_group_abuse-robustness is an expanded version of the identity group abuse dataset (https://aclanthology.org/2022.naacl-main.410/) but with perturbations of the original input questions and passages. It is intended for use as a benchmark for evaluating model robustness on question-answering to these perturbations.
Data Instances
identity_group_abuse-robustness
Size of… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/identity_group_abuse_robustness.
Coleção de Sipuncula do Museu Nacional - UFRJ
The dataset contains the Twitter Conversations for the task of Conversational Document Prediction (CDP). The dataset includes conversations that occurred between users and customer care agents in 25 organizations on the Twitter platform. Each conversation ends with a customer care agent providing a URL to a document to resolve the issue the user is facing. The task is to predict the document given a dialog context.
Coleção de Ascidiacea do Museu Nacional - UFRJ
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Pascal XML To YOLO TXT is a dataset for object detection tasks - it contains Documents annotations for 8,143 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The dataset permuted-bAbI dialog tasks is an extension of original-bAbI-dialog-tasks, as described in the paper: "Learning End-to-End Goal-Oriented Dialog with Multiple Answers". We modify the original-bAbI dialog tasks, by introducing multiple valid next utterances to the original-bAbI dialog tasks, which allows evaluation of end-to-end goal-oriented dialog systems in a more realistic setting.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Road Accident Detection is a dataset for object detection tasks - it contains Accidental Non_accidnetal annotations for 3,208 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
BookTest is a new dataset similar to the popular Children’s Book Test (CBT), however more than 60 times larger.
https://cdla.dev/permissive-1-0/https://cdla.dev/permissive-1-0/
SynthTabNet is a dataset of 600k png images from synthetically generated table layouts with annotations in jsonl files.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Otter DUDe Dataset Card
Otter DUDe includes 1,452,568 instances of drug-target interactions.
Dataset details
DUDe
DUDe comprises a collection of 22,886 active compounds and their corresponding affinities towards 102 targets. For our study, we utilized a preprocessed version of the DUDe, which includes 1,452,568 instances of drug-target interactions. To prevent any data leakage, we eliminated the negative interactions and the overlapping triples with the TDC DTI… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/otter_dude.
Coleção de Nematoda do Museu Nacional - UFRJ