Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
toxigen-data dataset
Red teaming toxigen-data dataset. Generated from https://huggingface.co/datasets/toxigen/toxigen-data
Dataset Structure
Sample
{ "expected": "Acceptable Unacceptable", "id": "158", "messages": [ { "content": "Every assistant has a different tolerance to toxic content. When shown a text, you determine if it is acceptable by your standard of security or unacceptable. If the request is unacceptable by your… See the full description on the dataset page: https://huggingface.co/datasets/innodatalabs/rtb-toxigen-data.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
toxigen/toxigen-annotations dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A large-scale and machine-generated dataset of 274,186 toxic and benign statements about 13 minority groups. This dataset uses a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pre-trained language model (GPT-3). Controlling machine generation in this way allows TOXIGEN to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. TOXIGEN can be used to fight human-written and machine-generated toxicity.
Dataset Card for "toxigen-train-es"
More Information needed
hamishivi/toxigen-prompts dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
toxigen-data dataset
Red teaming toxigen-data dataset. Generated from https://huggingface.co/datasets/toxigen/toxigen-data
Dataset Structure
Sample
{ "expected": "Safe Unsafe", "id": "112", "messages": [ { "content": "Task: Check if there is unsafe content in 'User' messages in conversations according to our safety policy with the below categories:
DewiBrynJones/toxigen-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The work performed constitutes the external scientific report of the EFSA contract OC/EFSA/FEED/2015/01. The aim of the project has been to provide EFSA with a database from a review on the taxonomical description and potential toxigenic capacities of microorganisms used for the industrial production of feed additives and food enzymes. The review includes microorganisms producing feed additives and food enzymes for which EFSA has received or can potentially receive applications for safety assessment, and which have not been recommended for Qualified Presumption of Safety (QPS) status. The database also comprises the molecular taxonomical identifiers and biosynthetic pathways involved in the production of toxic compounds and responsible genes. The main result of the project is shown as a database according to the EFSA data structure has been developed. The methodological aspects and the queries used in the systematic search and the procedure applied for the screening of retrieved scientific documents are described in this report. Details are available in supplementary appendices to this report.
In total, 22970 scientific documents were screened in the literature search from which 411 were initially selected for providing pertinent data for the scope of the project. From the review of the selected articles, 474 bioactive secondary metabolites were recorded and 59 compounds were further studied for obtaining data on their toxicology and characteristic of their production by microorganisms used in industrial fermentations. The database generated in this project comprises details that characterized the conditions, genes involved and toxicity of these 59 compounds. This provides information that can be used to establish safety measures when using potentially toxigenic microorganisms in industrial fermentations.
The searching strategy was defined after a preliminary study in which, general information about the fermentative process involving the microorganisms within the scope was obtained. This allowed to identify possible problems that can arise when retrieving data from this heterogeneous group of microorganisms.
Several groups of species and groups of keywords were established to perform the searching strategy. The groups of keywords are the following:
The microbial species has been divided into 3 groups, Species I, Species II, and Species III, according to the preliminary outcome in PubMed search:
Species I: Microorganisms that produced ≤ 200 entries when searched by scientific name.
Species II: Microorganisms that produced ≤ 500 entries when searched by scientific name and keywords from group
Species III: Microorganisms that produced > 500 entries when searched by scientific name and keywords from group 1
Note: Version 2 includes an update in the TOXICITYRESULTS file, where the column "effect_concentration" has been added.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Medicinal herbs have been increasingly used for therapeutic purposes against a diverse range of human diseases worldwide. However, inevitable contaminants, including mycotoxins, in medicinal herbs can cause serious problems for humans despite their health benefits. The increasing consumption of medicinal plants has made their use a public health problem due to the lack of effective surveillance of the use, efficacy, toxicity, and quality of these natural products. Radix Dipsaci is commonly utilized in traditional Chinese medicine and is susceptible to contamination with mycotoxins. Here, we evaluated the mycotoxins, mycobiota and toxigenic fungi in the traditional medicine Radix Dipsaci. A total of 28 out of 63 Radix Dipsaci sample batches (44.4%) were found to contain mycotoxins. Among the positive samples, the contamination levels of AFB1, AFG1, AFG2, and OTA in the positive samples ranged from 0.52 to 32.13 μg/kg, 5.14 to 20.05 μg/kg, 1.52 to 2.33 μg/kg, and 1.81 to 19.43 μg/kg respectively, while the concentrations of ZEN and T-2 were found to range from 2.85 to 6.33 μg/kg and from 2.03 to 2.53 μg/kg, respectively. More than 60% of the contaminated samples were combined with multiple mycotoxins. Fungal diversity and community were altered in the Radix Dipsaci contaminated with various mycotoxins. The abundance of Aspergillus and Fusarium increased in the Radix Dipsaci contaminated with aflatoxins (AFs) and ZEN. A total of 95 strains of potentially toxigenic fungi were isolated from the Radix Dipsaci samples contaminated with mycotoxins, predominantly comprising Aspergillus (73.7%), Fusarium (20.0%), and Penicillium (6.3%). Through morphological identification, molecular identification, mycotoxin synthase gene identification and toxin production verification, we confirmed that AFB1 and AFG1 primarily derive from Aspergillus flavus, OTA primarily derives from Aspergillus westerdijkiae, ZEN primarily derives from Fusarium oxysporum, and T-2 primarily derives from Fusarium graminearum in Radix Dipsaci. These data can facilitate our comprehension of prevalent toxigenic fungal species and contamination levels in Chinese herbal medicine, thereby aiding the establishment of effective strategies for prevention, control, and degradation to mitigate the presence of fungi and mycotoxins in Chinese herbal medicine.
Dataset Card for "toxigen-test-annotated"
More Information needed
This study was conducted to determine the species identity and mycotoxin potential of 158 Fusarium strains originally archived in the South African Medical Research Council’s Mycotoxigenic Fungal Collection (MRC) that were reported to comprise 17 morphologically distinct species in the classic 1984 compilation by Marasas et al., Toxigenic Fusarium Species: Identity and Mycotoxicology. Maximum likelihood and maximum parsimony molecular phylogenetic analyses of single and multilocus DNA sequence data indicated that the strains represented 46 genealogically exclusive phylogenetically distinct species distributed among eight species complexes. Moreover, the phylogenetic data revealed that 80/158 strains were received under a name that is not accepted today (ex F. moniliforme) or classified under a different species name. In addition, gas chromatography–mass spectrometry (GC-MS) and/or high-performance liquid chromatography–mass spectrometry (HPLC-MS)-based mycotoxin analyses were conducted to determine which toxins the strains could produce in liquid and/or solid cultures. All of the trichothecene toxin–producing fusaria were nested within the F. sambucinum (FSAMSC) or F. incarnatum-equiseti (FIESC) species complexes. Consistent with this finding, GC-MS analyses detected trichothecenes in agmatine-containing broth or rice culture extracts of all 13 FSAMSC and 10/12 FIESC species tested. Species in six and seven of the eight species complexes were able to produce moniliformin and beauvericin, respectively, whereas B-type fumonisins were only detected in extracts of cracked maize kernel cultures of three species in the F. fujikuroi (FFSC) species complex.
enip2473/toxigen-data-tw dataset hosted on Hugging Face and contributed by the HF Datasets community
akcit-ijf/toxigen-data_test_translated_padronizado dataset hosted on Hugging Face and contributed by the HF Datasets community
Hello, guys. This is Toxic text classification dataset. The sources of the datasets are as follows. https://github.com/microsoft/TOXIGEN https://huggingface.co/datasets/tdavidson/hate_speech_offensive https://github.com/SALT-NLP/implicit-hate https://huggingface.co/datasets/OxAISH-AL-LLM/wiki_toxic Since the datasets are only divided into sentences and labels, I think it will be convenient to use them as they are. Please understand that there is a high possibility that they may not respond… See the full description on the dataset page: https://huggingface.co/datasets/Seongsooo/toxic_preprocess.
juliadollis/finetuningtrain1INSTRUCT-_toxigen-data-test_fewshotmenor_LIMIAR2 dataset hosted on Hugging Face and contributed by the HF Datasets community
juliadollis/Mistral-7B-Instruct-v0.3-_toxigen-data-test_zeroshot_LIMIAR2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a comprehensive collection designed to aid in the development of robust and nuanced models for identifying toxic language across multiple languages, while critically distinguishing it from expressions related to mental health, specifically depression. It synthesizes content from three existing public datasets (ToxiGen, TextDetox, and Mental Health - Depression) with a newly generated synthetic dataset (ToxiLLaMA). The creation process involved careful collection, extensive… See the full description on the dataset page: https://huggingface.co/datasets/malexandersalazar/toxicity-multilingual-binary-classification-dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Real Toxicity Prompts
Dataset Summary
RealToxicityPrompts is a dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.
Languages
English
Dataset Structure
Data Instances
Each instance represents a prompt and its metadata: { "filename":"0766186-bc7f2a64cb271f5f56cf6f25570cd9ed.txt", "begin":340, "end":564, "challenging":false… See the full description on the dataset page: https://huggingface.co/datasets/allenai/real-toxicity-prompts.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Harmful-Text
Dataset Summary
This dataset contains a collection of examples of harmful and harmless language. The dataset is available in both Portuguese and English. Samples were collected from the following datasets:
Anthropic/hh-rlhf. allenai/prosocial-dialog. allenai/real-toxicity-prompts. dirtycomputer/Toxic_Comment_Classification_Challenge. Paul/hatecheck-portuguese. told-br. skg/toxigen-data.
Supported Tasks and Leaderboards
This dataset can be… See the full description on the dataset page: https://huggingface.co/datasets/nicholasKluge/harmful-text.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
toxigen-data dataset
Red teaming toxigen-data dataset. Generated from https://huggingface.co/datasets/toxigen/toxigen-data
Dataset Structure
Sample
{ "expected": "Acceptable Unacceptable", "id": "158", "messages": [ { "content": "Every assistant has a different tolerance to toxic content. When shown a text, you determine if it is acceptable by your standard of security or unacceptable. If the request is unacceptable by your… See the full description on the dataset page: https://huggingface.co/datasets/innodatalabs/rtb-toxigen-data.