3 datasets found

P
Nvidia's Aegis-AI-Content-Safety-Dataset-1.0 Dataset
paperswithcode.com
Updated Apr 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Nvidia's Aegis-AI-Content-Safety-Dataset-1.0 Dataset [Dataset]. https://paperswithcode.com/dataset/https-huggingface-co-datasets-nvidia-aegis-ai
Explore at:
Dataset updated
Apr 8, 2024
Description
Aegis AI Content Safety Dataset is an open-source content safety dataset (CC-BY-4.0), which adheres to Nvidia's content safety taxonomy, covering 13 critical risk categories (see Dataset Description).

Dataset Details Dataset Description The Aegis AI Content Safety Dataset is comprised of approximately 11,000 manually annotated interactions between humans and LLMs, split into 10,798 training samples and 1,199 test samples.

To curate the dataset, we use the Hugging Face version of human preference data about harmlessness from Anthropic HH-RLHF. We extract only the prompts, and elicit responses from Mistral-7B-v0.1. Mistral excels at instruction following and generates high quality responses for the content moderation categories. We use examples in the system prompt to ensure diversity by instructing Mistral to not generate similar responses. Our data comprises four different formats: user prompt only, system prompt with user prompt, single turn user prompt with Mistral response, and multi-turn user prompt with Mistral responses.

Data from: SEMFIRE forest dataset for semantic segmentation and data...

zenodo.org

application/gzip, bin +2

Updated Jan 20, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Dominik Bittner; Dominik Bittner; Maria Eduarda Andrada; Maria Eduarda Andrada; David Portugal; David Portugal; João Filipe Ferreira; João Filipe Ferreira (2022). SEMFIRE forest dataset for semantic segmentation and data augmentation [Dataset]. http://doi.org/10.5281/zenodo.5819064

Explore at:

zip, application/gzip, bin, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5819064

Dataset updated

Jan 20, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Dominik Bittner; Dominik Bittner; Maria Eduarda Andrada; Maria Eduarda Andrada; David Portugal; David Portugal; João Filipe Ferreira; João Filipe Ferreira

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

SEMFIRE Datasets (Forest environment dataset)

These datasets are used for semantic segmentation and data augmentation and contain various forestry scenes. They were collected as part of the research work conducted by the Institute of Systems and Robotics, University of Coimbra team within the scope of the Safety, Exploration and Maintenance of Forests with Ecological Robotics (SEMFIRE, ref. CENTRO-01-0247-FEDER-032691) research project coordinated by Ingeniarius Ltd.

The semantic segmentation algorithms attempt to identify various semantic classes (e.g. background, live flammable materials, trunks, canopies etc.) in the images of the datasets.

The datasets include diverse image types, e.g. original camera images and their labeled images. In total the SEMFIRE datasets include about 1700 image pairs. Each dataset includes corresponding .bag files.

To launch those .bag files on your ROS environment, use the instructions on the following Github repository

Description of each dataset:

2019_2020_quinta_do_bolao_coimbra: Robot moving on a path through a forest environment
2020_ctcv_parking_lot_coimbra: Robot moving in a circle in a parking lot for testings
2020_sete_fontes_forest: A set of forest images acquired by hand-held apparatus

Each dataset consists of following directories:

images directory: diverse image types, e.g. original camera images and their labeled images
rosbags directory: .bag files, which correspond to the image directory

Each images directory consists of following directories:

img: original camera images
lbl: single channel images (ground truth) with corresponding labels for each image in img
lbl_colored: camera images in lbl colorized according to different semantic classes (for more details see the datasets descriptions)
lbl_overlaid: camera images in img overlaid with corresponding labels (colored)

Each rosbags directory contains .bag files with the following topics:

2019_2020_quinta_do_bolao_coimbra_rosbags:
- /back_lslidar_packet
- /dalsa_camera_720p/compressed
- /flir_ax8/compressed
- /front_lslidar_packet
- /gps_fix
- /gps_time
- /gps_vel
- /imu/data
- /realsense/aligned_depth_to_color/image_raw
- /realsense/color/camera_info
- /realsense/color/image_raw/compressed
- /realsense/depth/camera_info
- /realsense/depth/image_rect_raw/compressed
- /realsense/extrinsics/depth_to_color
2020_ctcv_parking_lot_coimbra_rosbags:
- /dalsa_camera_720p/compressed
- /gps_fix
- /gps_ime
- /fused_point_cloud
- /imu/data
- /imu/mag
- /imu/rpy
2020_sete_fontes_forest_rosbags:
- /realsense/camera_info
- /realsense/depth_compressed/compressedDepth
- /realsense/nir/left/compressed
- /realsense/nir/right/compressed
- /realsense/rgb/compressed

All datasets include a detailed description as a text file. In addition, they include a rosbag_info.txt file with a description for each ROS inside the .bag files as well as a description for each ROS topic.

The following table shows the statistical description of typical portuguese woodland configurations with structured plantations of Pinus pinaster (Pp, pine trees) and Eucalyptus globulus (Eg, eucalyptus).

	"Low density" structured plantation	"High density" structured plantation
Tree density (assuming plantation in rows spaced 3m apart in all cases)	Eg: 900 trees/ha Pp: 450 trees/ha	Eg: 1400 trees/ha Pp: 1250 trees/ha
Average heights and corresponding ages of plantation trees	Eg: 12m (6 years old) Pp: 10m (15 years old)	Eg: 12m (6 years old) Pp: 10m (15 years old)
Maximum heights and corresponding fully-matured ages of plantation trees	Eg: 20m (11 years old) Pp: 30m (40 years old)	Eg: 20m (11 years old) Pp: 30m (40 years old)
Diameter at chest level (DCL – 1,3m) of plantation trees (average/maximum)	Eg: 15cm/25cm Pp: 20cm/50cm	Eg: 15cm/25cm Pp: 20cm/50cm
Natural density of herbaceous plants	30% of woodland area	30% of woodland area
Natural density of bush and shrubbery	30% of woodland area	30% of woodland area
Natural density of arboreal plants (not part of plantation)	5% of woodland area	5% of woodland area

h
alpaca-gpt4
huggingface.co
opendatalab.com
Updated Apr 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
alpaca-gpt4 [Dataset]. https://huggingface.co/datasets/vicgalle/alpaca-gpt4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 14, 2023
Authors
Victor Gallego
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for "alpaca-gpt4"

This dataset contains English Instruction-Following generated by GPT-4 using Alpaca prompts for fine-tuning LLMs. The dataset was originaly shared in this repository: https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM. This is just a wraper for compatibility with huggingface's datasets library.

Dataset structure

It contains 52K instruction-following data generated by GPT-4 using the same prompts as in Alpaca. The dataset has… See the full description on the dataset page: https://huggingface.co/datasets/vicgalle/alpaca-gpt4.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Nvidia's Aegis-AI-Content-Safety-Dataset-1.0 Dataset [Dataset]. https://paperswithcode.com/dataset/https-huggingface-co-datasets-nvidia-aegis-ai

Nvidia's Aegis-AI-Content-Safety-Dataset-1.0 Dataset

Explore at:

Dataset updated

Apr 8, 2024

Description

Aegis AI Content Safety Dataset is an open-source content safety dataset (CC-BY-4.0), which adheres to Nvidia's content safety taxonomy, covering 13 critical risk categories (see Dataset Description).

Dataset Details Dataset Description The Aegis AI Content Safety Dataset is comprised of approximately 11,000 manually annotated interactions between humans and LLMs, split into 10,798 training samples and 1,199 test samples.

To curate the dataset, we use the Hugging Face version of human preference data about harmlessness from Anthropic HH-RLHF. We extract only the prompts, and elicit responses from Mistral-7B-v0.1. Mistral excels at instruction following and generates high quality responses for the content moderation categories. We use examples in the system prompt to ensure diversity by instructing Mistral to not generate similar responses. Our data comprises four different formats: user prompt only, system prompt with user prompt, single turn user prompt with Mistral response, and multi-turn user prompt with Mistral responses.

Clear search

Close search

Google apps

Main menu

Nvidia's Aegis-AI-Content-Safety-Dataset-1.0 Dataset

Data from: SEMFIRE forest dataset for semantic segmentation and data...

alpaca-gpt4

Nvidia's Aegis-AI-Content-Safety-Dataset-1.0 DatasetSee More Versions

Nvidia's Aegis-AI-Content-Safety-Dataset-1.0 Dataset