Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so101_follower_t", "total_episodes": 1, "total_frames": 416, "total_tasks": 1, "total_videos": 0, "total_chunks": 1, "chunks_size": 1000, "fps": 100, "splits": { "train": "0:1" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":β¦ See the full description on the dataset page: https://huggingface.co/datasets/pepijn223/test-dataset-upload.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Stepnosing Aug Upload is a dataset for object detection tasks - it contains Objects annotations for 884 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Dataset Card
This dataset contains a single huggingface split, named 'all_samples'. The samples contains a single huggingface feature, named called "sample". Samples are instances of plaid.containers.sample.Sample. Mesh objects included in samples follow the CGNS standard, and can be converted in Muscat.Containers.Mesh.Mesh. Example of commands: import pickle from datasets import load_dataset from plaid.containers.sample import Sample
dataset =β¦ See the full description on the dataset page: https://huggingface.co/datasets/PLAID-datasets/AirfRANS_original.
TRAINING DATASET: Hands-On Uploading Data (Download This File)
ibrahimndaw/test-upload dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset was created by Luke Manoli
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Upload_cvat is a dataset for object detection tasks - it contains Weld annotations for 1,676 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.
This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.
https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">
The dataset contains the following:
Set | Images | Annotations |
---|---|---|
Train | 1808 | 3048 |
Validate | 490 | 747 |
Test | 254 | 411 |
Total | 2552 | 4206 |
The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.
Download the data here: sarnet.zip
Or follow these steps
# download the dataset
wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
# extract the files
unzip sarnet.zip
***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.
Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb
Source code for the paper is located here: SaRNet_train_test.ipynb
@misc{thoreau2021sarnet,
title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery},
author={Michael Thoreau and Frazer Wilson},
year={2021},
eprint={2107.12469},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data was collected as a course project for the immersive data science course (by General Assembly and Misk Academy).
This dataset is in a CSV format, it consists of 5717 rows and 15 columns, where each row is a dataset on Kaggle and each column represents a feature of that dataset. |Feature|Description| |-------|-----------| |title| dataset name | |usability| dataset usability rating by Kaggle | |num_of_files| number of files associated with the dataset | |types_of_files| types of files associated with the dataset | |files_size| size of the dataset files | |vote_counts| total votes count by the dataset viewer | |medal| reward to popular datasets measured by the number of upvotes (votes by novices are excluded from medal calculation), [Bronze = 5 Votes, Silver = 20 Votes, Gold = 50 Votes] | |url_reference| reference to the dataset page on Kaggle in the format: www.kaggle.com/url_reference | |keywords| Topics tagged with the dataset | |num_of_columns| number of features in the dataset | |views| number of views | |downloads| number of downloads | |download_per_view| download per view ratio | |date_created| dataset creation date | |last_updated| date of the last update |
I would like to thank all my GA instructors for their continuous help and support
All data were taken from https://www.kaggle.com , collected on 30 Jan 2021
Using this dataset, we could try to predict the upcoming datasets uploaded, number of votes, number of downloads, medal type, etc.
This is a PDF document created by the Department of Information Technology (DoIT) and the Governor's Office of Performance Improvement to assist training Maryland state employees on use of the Open Data Portal, https://opendata.maryland.gov. This document covers direct data entry, uploading Excel spreadsheets, connecting source databases, and transposing data. Please note that this tutorial is intended for use by state employees, as non-state users cannot upload datasets to the Open Data Portal.
BACI Dataset Documentation BACI provides data on bilateral trade flows for 200 countries at the product level (5000 products). Products correspond to the "Harmonized System" nomenclature (6 digit code). BACI relies on data from the United Nations Statistical Division (Comtrade dataset). Since countries report both their imports and their exports to the United Nations, the raw data we use may have duplicates flows: trade from country i to country j may be reported by i as an export to j and by j as an import from i. The reported values should match, but in practice are virtually never identical, for two reasons: Import values are reported CIF (cost, insurance and freight) while exports are reported FOB (free on board). Mistakes are made, because of uncertainty on the final destination of exports, discrepancies in the classification of a given product, etc... Licensed EtaLab Open Licence v2.0, original data downloaded from http://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=37"
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
The Measuring Broadband Australia (MBA) program relies on households across Australia volunteering to receive a Whitebox that tests the performance of their fixed-line broadband services. Thousands of tests are run and these measurements are used to calculate average speeds achieved and other metrics for different volunteer groups, such as volunteers on the NBN fixed-line services and volunteers on NBN fixed wireless services. The summary data released includes test results for all Whiteboxes used in MBA Report 18 for download, upload, latency and outages metrics. The results are de-identified to protect the privacy of the volunteers and the integrity of the MBA program.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.
Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.
The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.
The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.
The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.
Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).
The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:
βββ ada_embedding_for_DORIS-MAE_v1.pickle βββ "Query" β βββ query_id_1 (Embedding of query_1) β βββ query_id_2 (Embedding of query_2) β βββ query_id_3 (Embedding of query_3) β . β . β . βββ "Corpus" βββ corpus_id_1 (Embedding of abstract_1) βββ corpus_id_2 (Embedding of abstract_2) βββ corpus_id_3 (Embedding of abstract_3) . . .
william-yudi/jean-upload dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset offers a diverse collection of images curated to support the development of computer vision models for detecting and inspecting Fire Safety Equipment (FSE) and related components. Images were collected from a variety of public buildings in Germany, including university buildings, student dormitories, and shopping malls. The dataset consists of self-captured images using mobile cameras, providing a broad range of real-world scenarios for FSE detection.
In the journal paper associated with these image datasets, the open-source dataset FireNet (Boehm et al. 2019) was additionally utilized for training. However, to comply with licensing and distribution regulations, images from FireNet have been excluded from this dataset. Interested users can visit the FireNet repository directly to access and download those images if additional data is required. The provided weights (.pt), however, are trained on the provided self-made images and FireNet using YOLOv8.
The dataset is organized into six sub-datasets, each corresponding to a specific FSE-related machine learning service:
Service 1: FSE Detection - This sub-dataset provides the foundation for FSE inspection, focusing on the detection of primary FSE components like fire blankets, fire extinguishers, manual call points, and smoke detectors.
Service 2: FSE Marking Detection - Building on the first service, this sub-dataset includes images and annotations for detecting FSE marking signs.
Service 3: Condition Check - Modal - This sub-dataset addresses the inspection of FSE condition in a modal manner, focusing on instances where fire extinguishers might be blocked or otherwise non-compliant. This dataset includes semantic segmentation annotations of fire extinguishers. For upload reasons, this set is split into 3_1_FSE Condition Check_modal_train_data (containing training images and annotations) and 3_1_FSE Condition Check_modal_val_data_and_weights (containing validation images, annotations and the best weights).
Service 4: Condition Check - Amodal - Extending the modal condition check, this sub-dataset involves amodal detection to identify and infer the state of FSE components even when they are partially obscured. This dataset includes semantic segmentation annotations of fire extinguishers. This dataset includes semantic segmentation annotations of fire extinguishers. For upload reasons, this set is split into 4_1_FSE Condition Check_amodal_train_data (containing training images and annotations) and 4_1_FSE Condition Check_amodal_val_data_and_weights (containing validation images, annotations and the best weights).
Service 5: Details Extraction - Inspection Tags - This sub-dataset provides a detailed examination of the inspection tags on fire extinguishers. It includes annotations for extracting semantic information such as the next maintenance date, contributing to a thorough evaluation of FSE maintenance practices.
Service 6: Details Extraction - Fire Classes Symbols - The final sub-dataset focuses on identifying fire class symbols on fire extinguishers.
This dataset is intended for researchers and practitioners in the field of computer vision, particularly those engaged in building safety and compliance initiatives.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
16th Run Uploaded Images E40 is a dataset for object detection tasks - it contains Tomato Ripeness annotations for 3,143 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://data.gov.tw/licensehttps://data.gov.tw/license
This dataset provides the annual import volume of 10 categories of products, allowing use by academic units, businesses, and the general public.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
View on bioimage.io # HPA Cell Image Segmentation Dataset
This dataset includes annotated cell images obtained from the Human Protein Atlas (http://www.proteinatlas.org), each image contains 4 channels (Microtubules, ER, Nuclei and Protein of Interest). The cells in each image are annotated with polygons and saved into GeoJSON format produced with Kaibu(https://kaibu.org) annotation tool.
hpa_cell_segmentation_dataset_v2_512x512_4train_159test.zip is an example dataset for running a deep learning-based interactive annotation tools in ImJoy (https://github.com/imjoy-team/imjoy-interactive-segmentation).
hpa_dataset_v2.zip is a full annotate image segmentation dataset
Utility functions in Python for reading the GeoJSON annotation can be found here: https://github.com/imjoy-team/kaibu-utils/blob/main/kaibu_utils/init.py
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Manuscript database (Mol Ecol)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two different types of datasets are uploaded. The first dataset is useful for detection purpose. This consists of a total 1680 goat image where the face, eye, mouth and ear bounding boxes are given in YOLO format. On the other hand, the second dataset is for face recognition and facial expression analysis. In total 1311 images are captured from 10 individuals of a Chinese farm.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so101_follower_t", "total_episodes": 1, "total_frames": 416, "total_tasks": 1, "total_videos": 0, "total_chunks": 1, "chunks_size": 1000, "fps": 100, "splits": { "train": "0:1" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":β¦ See the full description on the dataset page: https://huggingface.co/datasets/pepijn223/test-dataset-upload.