100+ datasets found
  1. P

    INSTANCE Dataset

    • paperswithcode.com
    Updated Nov 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). INSTANCE Dataset [Dataset]. https://paperswithcode.com/dataset/instance
    Explore at:
    Dataset updated
    Nov 29, 2021
    Description

    INSTANCE is a data collection of more than 1.3 million seismic waveforms originating from a selection of about 54,000 earthquakes occurred since 2005 in Italy and surrounding regions and seismic noise recordings randomly extracted from event free time windows of the continuous waveforms archive. The purpose is to provide reference datasets useful to develop and test seismic data processing routines based on machine learning and deep learning frameworks. The primary source of this information is ISIDe (Italian Seismological Instrumental and Parametric Data-Base) for earthquakes and the Italian node of EIDA (http://eida.ingv.it) for seismic data. All the waveforms have been sized to a 120 s window, preprocessed and resampled at 100 Hz. For each trace we provide a large number of parameters as metadata, either derived from event information or computed from trace data. Associated metadata allow for the identification of the source, the station, the path travelled by seismic waves and assessment of the trace quality. The total size of the data collection is about 330 GB. Waveforms files are available either in counts or ground motion units in hdf5 format to facilitate fast access from commonly used machine learning frameworks.

  2. Data for Testing of instance segmentation

    • kaggle.com
    Updated Dec 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srinjoy Bhuiya (2021). Data for Testing of instance segmentation [Dataset]. https://www.kaggle.com/datasets/srinjoybhuiya/data-for-testing-of-instance-segmentation/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Srinjoy Bhuiya
    Description

    Dataset

    This dataset was created by Srinjoy Bhuiya

    Contents

  3. ReSyRIS: Real-Synthetic Rock Instance Segmentation dataset

    • zenodo.org
    zip
    Updated Mar 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wout Boerdijk; Wout Boerdijk; Marcus Gerhard Müller; Marcus Gerhard Müller; Maximilian Durner; Maximilian Durner; Rudolph Triebel; Rudolph Triebel (2023). ReSyRIS: Real-Synthetic Rock Instance Segmentation dataset [Dataset]. http://doi.org/10.5281/zenodo.7691201
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wout Boerdijk; Wout Boerdijk; Marcus Gerhard Müller; Marcus Gerhard Müller; Maximilian Durner; Maximilian Durner; Rudolph Triebel; Rudolph Triebel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    # ReSyRIS
    
    The Real-Synthetic Rock Instance Segmentation dataset (ReSyRIS) is created for training and evaluation of rock segmentation, detection and instance segmentation in (quasi-)extra-terrestrial environments. It consists of a set of annotated, real images of rocks on a lunar-like surface, a precisely mimicked synthetic version thereof, and respective synthetic assets for training data generation.
    
    In the folders, you find the following structure:
    
    - `stone_models`: all 36 .obj files of the 3d reconstructed stones
    - `test_data_realworld`: the real world recordings with accompanying ground truth
    - `test_data_synthetic`: the synthetic renderings matching approximately the real world recordings, with accompanying ground truth
    - `oaisys`: config files for rendering synthetic training data with oaisys
    
    If you find this dataset useful for your work please consider citing our paper: https://elib.dlr.de/194113/.
    
  4. r

    Dataset for "Instance dataset for a multiprocessor scheduling problem with...

    • researchdata.se
    Updated Jul 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emil Karlsson (2024). Dataset for "Instance dataset for a multiprocessor scheduling problem with multiple time windows and time lags: Similar instances with large differences in difficulty" [Dataset]. http://doi.org/10.48360/etww-2281
    Explore at:
    (18656), (33538), (400598223), (8)Available download formats
    Dataset updated
    Jul 4, 2024
    Dataset provided by
    Linköping University
    Authors
    Emil Karlsson
    Description

    Here lies the raw dataset for the publication "Instance dataset for a multiprocessor scheduling problem with multiple time windows and time lags: Similar instances with large differences in difficulty".

    The following files and folders are made available here: - README.md. Describes the repository - instances. Folder that contains the instances of the dataset (.zip files) together with files containing supporting information about instances (.txt files) - main.py. Main entry point for solving an instance with a Constraint Programming model - requirements.txt. File containing the needed Python Packages to run the tests - LICENSE. File containing the License - data.py. Contains dataclasses that represents a problem definition - instance_io.py. Contains methods for parsing instances from disc and writing instances to disc - cp_solve.py. Methods for creating a Constraint Programming model of the problem with IBM ILOG CP Optimizer - create_sbatch.py. Entry point for creating sbatch files - sbatch. Folder that contains SBATCH files for the instances in the dataset

    The dataset was originally published in DiVA and moved to SND in 2024.

  5. Z

    Data from: FISBe: A real-world benchmark dataset for instance segmentation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mais, Lisa (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
    Explore at:
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Kandarpa, Ramya
    Reinke, Annika
    Kainmueller, Dagmar
    Hirsch, Peter
    Rumberger, Josef Lorenz
    Ihrke, Gudrun
    Maier-Hein, Lena
    Mais, Lisa
    Managan, Claire
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General

    For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

    Summary

    A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

    30 completely labeled (segmented) images

    71 partly labeled images

    altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

    To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

    A set of metrics and a novel ranking score for respective meaningful method benchmarking

    An evaluation of three baseline methods in terms of the above metrics and score

    Abstract

    Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

    Dataset documentation:

    We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

    FISBe Datasheet

    Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

    Files

    fisbe_v1.0_{completely,partly}.zip

    contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

    fisbe_v1.0_mips.zip

    maximum intensity projections of all samples, for convenience.

    sample_list_per_split.txt

    a simple list of all samples and the subset they are in, for convenience.

    view_data.py

    a simple python script to visualize samples, see below for more information on how to use it.

    dim_neurons_val_and_test_sets.json

    a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

    Readme.md

    general information

    How to work with the image files

    Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

    We recommend to work in a virtual environment, e.g., by using conda:

    conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

    How to open zarr files

    Install the python zarr package:

    pip install zarr

    Opened a zarr file with:

    import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

    optional:import numpy as npraw_np = np.array(raw)

    Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

    How to view zarr image files

    We recommend to use napari to view the image data.

    Install napari:

    pip install "napari[all]"

    Save the following Python script:

    import zarr, sys, napari

    raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

    viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

    Execute:

    python view_data.py /R9F03-20181030_62_B5.zarr

    Metrics

    S: Average of avF1 and C

    avF1: Average F1 Score

    C: Average ground truth coverage

    clDice_TP: Average true positives clDice

    FS: Number of false splits

    FM: Number of false merges

    tp: Relative number of true positives

    For more information on our selected metrics and formal definitions please see our paper.

    Baseline

    To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

    License

    The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Citation

    If you use FISBe in your research, please use the following BibTeX entry:

    @misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

    Acknowledgments

    We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

    Changelog

    There have been no changes to the dataset so far.All future change will be listed on the changelog page.

    Contributing

    If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

    All contributions are welcome!

  6. Z

    Stereo Instances on Surfaces (STIOS)

    • data.niaid.nih.gov
    Updated Apr 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boerdijk, Wout (2021). Stereo Instances on Surfaces (STIOS) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4706906
    Explore at:
    Dataset updated
    Apr 23, 2021
    Dataset provided by
    Durner, Maximilian
    Boerdijk, Wout
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    The Stereo Instances on Surfaces Dataset (STIOS) is created for evaluation of instance-based algorithms. It is a representative dataset to achieve uniform comparability for instance detection and segmentation with different input modalities (RGB, RGB-D, stereo RGB). STIOS is mainly intended for robotic applications (e.g. object manipulation), which is why the dataset refers to horizontal surfaces.

    Sensors

    STIOS contains recordings from two different sensors: a rc_visard 65 color and a Stereolabs ZED camera. Aside stereo RGB (left and right RGB image), the internally generated depth maps are also saved for both sensors. In addition, the ZED sensor provides normal images and point cloud data which are also provided in STIOS. Since some objects / surfaces have little texture and this would have a negative impact on the quality of the depth map, an additional LED projector with a random point pattern is used when recording the depth images (only used for rc_visard 65 color). Consequently, for the rc_visard 65 color STIOS includes RGB images and the resulting depth maps with and without a projected pattern.

    The large number of different input modalities should enable evaluation of a wide variety of methods. As you can see in the picture, the ZED sensor was mounted above the rc_visard 65 lenses to get a similar viewing angle. This enables an evaluation between the sensors, whereby comparisons can be made about the generalization of a method with regard to sensors or the quality of the input modality.

    Objects

    The dataset contains the following objects from the YCB video dataset and thus covers several application areas such as unknown instance segmentation, instance detection and segmentation (detection + classification):

    003_cracker_box, 005_tomato_soup_can, 006_mustard_bottle, 007_tuna_fish_can, 008_pudding_box, 010_potted_meat_can, 011_banana, 019_pitcher_base, 021_bleach_cleanser, 024_bowl, 025_mug, 035_power_drill, 037_scissors, 052_extra_large_clamp, 061_foam_brick.

    Due to the widespread use of these objects in robotic applications there are 3D models for each of the objects which can be used to generate synthetic training data for e.g. instance detection based on RGB-D. In order to guarantee an evenly distributed occurrence of the 15 objects, 4-6 random objects are selected by machine for each sample. The alignment of the objects is either easy (objects do not touch) or difficult (objects may touch or lie on top of each other).

    Surroundings

    The data set contains 8 different environments in order to cover the variation of environmental parameters such as lighting, background or scene surfaces. Scenes for the data set were recorded in the following environments: office carpet, workbench, white table, wooden table, conveyor belt, lab floor, wooden plank und tool cabinet.

    The scenes where chosen carefully to ensure that they contain surfaces that are both friendly as well as challenging to stereo sensors. STIOS therefore contains low-texture surfaces (e.g. white table, conveyor belt) and texture-rich surfaces (e.g. lab floor, wooden plank). The above-mentioned variations of the surfaces and environments allows to evaluate methods in terms of robustness against and generalization to various environmental parameters.

    For each scene surface, 3 easy and 3 difficult samples are generated from 4 manually set camera angles (approx. 0.3-1m distance). As the illustration shows, even with light object alignment the objects can occlude each other in some camera angles. The 6 samples per camera setting result in 24 samples per environment for each sensor, which results in a total of 192 samples per sensor.

    Annotations

    For each of these samples (192x2) all object instances in the left camera image were annotated manually (instance mask + object class). The annotations are available in the form of 8-bit grayscale images, which represent the semantic classes in the image. Since each object appears only once in the image, object instance masks can also be obtained from this format at the same time.

    The dataset is structured as follows:

    STIOS |--rc_visard | |--conveyor_belt | | |--left_rgb | | |--right_rgb | | |--gt | | |--depth | | |--left_rgb_pattern | | |--right_rgb_pattern | | |--depth_pattern | |--lab_floor | |-- ... |--zed | |-- conveyor_belt | | |--left_rgb | | |--right_rgb | | |--gt | | |--depth | | |--normals | | |--pcd | |--lab_floor | |--...

    We also provide code utilities which allow visualization of images and annotations of STIOS and contain various utility functions to e.g. generate bounding box annotations from the semantic grayscale images. Please find them here: https://github.com/DLR-RM/stios-utils.

    Citation

    If STIOS is useful for your research please cite

    @misc{durner2021unknown, title={Unknown Object Segmentation from Stereo Images}, author={Maximilian Durner and Wout Boerdijk and Martin Sundermeyer and Werner Friedl and Zoltan-Csaba Marton and Rudolph Triebel}, year={2021}, eprint={2103.06796}, archivePrefix={arXiv}, primaryClass={cs.CV} }

    STIOS in projects

    Unknown Object Segmentation from Stereo Images M. Durner, W. Boerdijk, M. Sundermeyer, W. Friedl, Z.-C. Marton, and R. Triebel. "Unknown Object Segmentation from Stereo Images", arXiv preprint arXiv:2103.06796 (2021).

    This method enables the segmentation of unknown object instances that are located on horizontal surfaces (e.g. tables, floors, etc.). Due to the often incomplete depth data in robotic applications, stereo RGB images are used here. On the one hand, STIOS is employed to show the functionality of stereo images for unknown instance segmentation, and on the other hand, to make a comparison with existing work, which for the most part directly access depth data.

    "What's This?" - Learning to Segment Unknown Objects from Manipulation Sequences W. Boerdijk, M. Sundermeyer, M. Durner, and R. Triebel. "'What's This?' - Learning to Segment Unknown Objects from Manipulation Sequences", International Conference on Robotics and Automation (ICRA), 2021 (to appear).

    This work deals with the segmentation of objects that have been grasped by a robotic arm. With the help of this method it is possible to generate object-specific image data in an automated process. This data can then be used for training object detectors or segmentation approaches. In order to show the usability of the generated data, STIOS is used as an evaluation data set for instance segmentation on RGB images.

  7. SUMS and Electronic Services Production Data Warehouse Query Instance

    • catalog.data.gov
    • data.wu.ac.at
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). SUMS and Electronic Services Production Data Warehouse Query Instance [Dataset]. https://catalog.data.gov/dataset/sums-and-electronic-services-production-data-warehouse-query-instance
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    Social Security Administrationhttp://www.ssa.gov/
    Description

    Stores information relating to the data warehouse for query access.

  8. Data from: TrafficCAM: A Versatile Dataset for Traffic Flow Segmentation

    • figshare.com
    bin
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhongying Deng; Yanqi Cheng; Lihao Liu; Shujun Wang; Rihuan Ke; Carola-Bibiane Schönlieb; Angelica I Aviles-Rivero (2024). TrafficCAM: A Versatile Dataset for Traffic Flow Segmentation [Dataset]. http://doi.org/10.6084/m9.figshare.25399681.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    figshare
    Authors
    Zhongying Deng; Yanqi Cheng; Lihao Liu; Shujun Wang; Rihuan Ke; Carola-Bibiane Schönlieb; Angelica I Aviles-Rivero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Traffic flow analysis is revolutionising traffic management. Qualifying traffic flow data, traffic control bureaus could provide drivers with real-time alerts, advising the fastest routes and therefore optimising transportation logistics and reducing congestion. The existing traffic flow datasets have two major limitations. They feature a limited number of classes, usually limited to one type of vehicle, and the scarcity of unlabelled data. In this paper, we introduce a new benchmark traffic flow image dataset called TrafficCAM. Our dataset distinguishes itself by two major highlights. Firstly, TrafficCAM provides both pixel-level and instance-level semantic labelling along with a large range of types of vehicles and pedestrians. It is composed of a large and diverse set of video sequences recorded in streets from eight Indian cities with stationary cameras. Secondly, TrafficCAM aims to establish a new benchmark for developing fully-supervised tasks, and importantly, semi-supervised learning techniques. It is the first dataset that provides a vast amount of unlabelled data, helping to better capture traffic flow qualification under a low-cost annotation requirement. More precisely, our dataset has 4,364 image frames with semantic and instance annotations along with 58,689 unlabelled image frames. We validate our new dataset through a large and comprehensive range of experiments on several state-of-the-art approaches under four different settings: fully-supervised semantic and instance segmentation, and semi-supervised semantic and instance segmentation tasks.

  9. R

    Data from: Lastinstance Dataset

    • universe.roboflow.com
    zip
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    last (2025). Lastinstance Dataset [Dataset]. https://universe.roboflow.com/last-rrvlu/lastinstance/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    last
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Wd Polygons
    Description

    Lastinstance

    ## Overview
    
    Lastinstance is a dataset for instance segmentation tasks - it contains Wd annotations for 300 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  10. First instance decisions on asylum applications by type of decision - annual...

    • data.europa.eu
    csv, html, tsv, xml
    Updated Jun 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2016). First instance decisions on asylum applications by type of decision - annual aggregated data [Dataset]. https://data.europa.eu/data/datasets/zcma8lg9bm7du07hnemv6q?locale=en
    Explore at:
    csv, xml, html, tsvAvailable download formats
    Dataset updated
    Jun 14, 2016
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Decisions granted by the respective authority acting as a first instance of the administrative or judicial asylum procedure in the receiving country, during the reference year (aggregated quarterly data). Includes decisions on Geneva convention status, subsidiary protection, humanitarian status and rejected.

  11. Z

    Data from: Night and Day Instance Segmented Park (NDISPark) Dataset: a...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amato, Giuseppe (2023). Night and Day Instance Segmented Park (NDISPark) Dataset: a Collection of Images taken by Day and by Night for Vehicle Detection, Segmentation and Counting in Parking Areas [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6560822
    Explore at:
    Dataset updated
    Sep 11, 2023
    Dataset provided by
    Ciampi, Luca
    Santiago, Carlos
    Gennaro, Claudio
    Amato, Giuseppe
    Costeira, Joao Paulo
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    The Dataset

    A collection of images of parking lots for vehicle detection, segmentation, and counting. Each image is manually labeled with pixel-wise masks and bounding boxes localizing vehicle instances. The dataset includes about 250 images depicting several parking areas describing most of the problematic situations that we can find in a real scenario: seven different cameras capture the images under various weather conditions and viewing angles. Another challenging aspect is the presence of partial occlusion patterns in many scenes such as obstacles (trees, lampposts, other cars) and shadowed cars. The main peculiarity is that images are taken during the day and the night, showing utterly different lighting conditions.

    We suggest a three-way split (train-validation-test). The train split contains images taken during the daytime while validation and test splits include images gathered at night. In line with these splits we provide some annotation files:

    train_coco_annotations.json and val_coco_annotations.json --> JSON files that follow the golden standard MS COCO data format (for more info see https://cocodataset.org/#format-data) for the training and the validation splits, respectively. All the vehicles are labeled with the COCO category 'car'. They are suitable for vehicle detection and instance segmentation.

    train_dot_annotations.csv and val_dot_annotations.csv --> CSV files that contain xy coordinates of the centroids of the vehicles for the training and the validation splits, respectively. Dot annotation is commonly used for the visual counting task.

    ground_truth_test_counting.csv --> CSV file that contains the number of vehicles present in each image. It is only suitable for testing vehicle counting solutions.

    Citing our work

    If you found this dataset useful, please cite the following paper

    @inproceedings{Ciampi_visapp_2021, doi = {10.5220/0010303401850195}, url = {https://doi.org/10.5220%2F0010303401850195}, year = 2021, publisher = {{SCITEPRESS} - Science and Technology Publications}, author = {Luca Ciampi and Carlos Santiago and Joao Costeira and Claudio Gennaro and Giuseppe Amato}, title = {Domain Adaptation for Traffic Density Estimation}, booktitle = {Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications} }

    and this Zenodo Dataset

    @dataset{ciampi_ndispark_6560823, author = {Luca Ciampi and Carlos Santiago and Joao Costeira and Claudio Gennaro and Giuseppe Amato}, title = {{Night and Day Instance Segmented Park (NDISPark) Dataset: a Collection of Images taken by Day and by Night for Vehicle Detection, Segmentation and Counting in Parking Areas}}, month = may, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.6560823}, url = {https://doi.org/10.5281/zenodo.6560823} }

    Contact Information

    If you would like further information about the dataset or if you experience any issues downloading files, please contact us at luca.ciampi@isti.cnr.it

  12. f

    Bonn Roof Material + Satellite Imagery Dataset

    • figshare.com
    zip
    Updated Apr 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Huang; Yue Lin; Alex Nhancololo (2025). Bonn Roof Material + Satellite Imagery Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28713194.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 18, 2025
    Dataset provided by
    figshare
    Authors
    Julian Huang; Yue Lin; Alex Nhancololo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bonn
    Description

    This dataset consists of annotated high-resolution aerial imagery of roof materials in Bonn, Germany, in the Ultralytics YOLO instance segmentation dataset format. Aerial imagery was sourced from OpenAerialMap, specifically from the Maxar Open Data Program. Roof material labels and building outlines were sourced from OpenStreetMap. Images and labels are split into training, validation, and test sets, meant for future machine learning models to be trained upon, for both building segmentation and roof type classification.The dataset is intended for applications such as informing studies on thermal efficiency, roof durability, heritage conservation, or socioeconomic analyses. There are six roof material types: roof tiles, tar paper, metal, concrete, gravel, and glass.Note: The data is in a .zip due to file upload limits. Please find a more detailed dataset description in the README.md

  13. 3xm 80 160 (RGB-D Instance Seg. for bin-picking)

    • kaggle.com
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobia Ippolito (2024). 3xm 80 160 (RGB-D Instance Seg. for bin-picking) [Dataset]. https://www.kaggle.com/datasets/tobiaippolito/3xm-80-160/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tobia Ippolito
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    In short

    This dataset used to investigate the influence of the unique amount of 3D-Models (Shapes) and Materials (Textures) towards the shape-textures bias, performance and generalization of deep neural network instance segmentation in my bachelor exam.

    • one of nine datasets created in Unreal Engine 5 with an NVIDIA RTX A4500
    • It uses 80 unique shapes and 160 unique textures
    • RGB, depth and solution masks are available
    • 20.000 Scenes
    • Ready to use Dataloader, training and inference -> see next section

    Usage

    You can load the images like:

    import cv2
    
    image = cv2.imread(img_path)
    if image is None:
      raise FileNotFoundError(f"Error during data loading: there is no '{img_path}'")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
    depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
    if len(depth.shape) > 2:
      _, depth, _, _ = cv2.split(depth)
          
    mask = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)  # cv2.IMREAD_GRAYSCALE)
    

    For easy use I recommend to use my own code. You can directly use it to train Mask R-CNN or just use the dataloader. Both are shown now:

    First: Clone my torch github project into your project terminal cd ./path/to/your/project git clone https://github.com/xXAI-botXx/torch-mask-rcnn-instance-segmentation.git Second: Install the anaconda env (optional) terminal cd ./path/to/your/project cd ./torch-mask-rcnn-instance-segmentation conda env create -f conda_env.yml Third: You are ready to use

    Using only the dataloader for your custom project: ```python import os import numpy as np import matplotlib.pyplot as plt import cv2 from torch.utils.data import DataLoader

    import sys sys.path.append("./torch-mask-rcnn-instance-segmentation")

    from maskrcnn_toolkit import DATA_LOADING_MODE, Dual_Dir_Dataset, collate_fn, extract_and_visualize_mask

    data_mode = DATA_LOADING_MODE.ALL

    dataset = Dual_Dir_Dataset(img_dir="/path/to/rgb-folder", depth_dir="/path/to/depth-folder", mask_dir="/path/to/mask-folder", transform=None, amount=1, start_idx=0, end_idx=0, image_name="...", data_mode=data_mode, use_mask=True, use_depth=False, log_path="./logs", width=1920, height=1080, should_log=True, should_print=True, should_verify=False) data_loader = DataLoader(dataset, batch_size=5, shuffle=True, num_workers=4, collate_fn=collate_fn)

    plot

    for data in data_loader: for batch_idx in range(len(data[0])): if len(data) == 3: image = data[0][batch_idx].cpu().unsqueeze(0) masks = data[1][batch_idx]["masks"] masks = masks.cpu() name = data[2][batch_idx] else: image = data[0][batch_idx].cpu().unsqueeze(0) name = data[1][batch_idx]

      image = image.cpu().numpy().squeeze(0)
      image = np.transpose(image, (1, 2, 0)) # Convert to HWC
    
      # Remove 4.th channel if existing
      if image.shape[2] == 4:
        depth = image[:, :, 3]
        image = image[:, :, :3]
      else:
        depth = None
    
      masks_gt = masks.cpu().numpy()
      masks_gt = np.transpose(masks_gt, (1, 2, 0))
      mask = extract_and_visualize_mask(masks_gt, image=None, ax=None, visualize=False, color_map=None, soft_join=False)
    
      # plot
      cols = 1
      if depth is not None:
        cols += 1
      if mask is not None:
        cols += 1
    
      fig, ax = plt.subplots(nrows=1, ncols=cols, figsize=(20, 15*cols))
      fig.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.05, hspace=0.05)
    
      plot_idx = 0
      ax[plot_idx].imshow(image)
      ax[plot_idx].set_title("RGB Input Image")
      ax[plot_idx].axis("off")
    
      if depth is not None:
        plot_idx += 1
        ax[plot_idx].imshow(depth, cmap="gray")
        ax[plot_idx].set_title("Depth Input Image")
        ax[plot_idx].axis("off")
    
      if mask is not None:
        plot_idx += 1
        ax[plot_idx].imshow(mask)
        ax[plot_idx].set_title("Mask Ground Truth")
        ax[plot_idx].axis("off")
    
      plt.show()
    
    
    **Using the whole Mask R-CNN training pipeline:**
    ```python
    import sys
    sys.path.append("./torch-mask-rcnn-instance-segmentation")
    
    from maskrcnn_toolkit import DATA_LOADING_MODE, train
    
    
    # set the vars as you need
    
    WEIGHTS_PATH = None   # Path to the model weights file
    USE_DEPTH = False      # Whether to include depth information -> as rgb and depth on green channel
    VERIFY_DATA = False     # True is recommended
    
    GROUND_PATH = "D:/3xM"  
    DATASET_NAME = "3xM_Dataset_80_160"
    IMG_DIR = os.path.join(G...
    
  14. t

    Data from: MosaicFusion: Diffusion Models as Data Augmenters for Large...

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation [Dataset]. https://service.tib.eu/ldmservice/dataset/mosaicfusion--diffusion-models-as-data-augmenters-for-large-vocabulary-instance-segmentation
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    MosaicFusion: A simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.

  15. i

    Data from: Instance Space Analysis of Search-Based Software Testing

    • ieee-dataport.org
    Updated Feb 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neelofar Neelofar (2022). Instance Space Analysis of Search-Based Software Testing [Dataset]. https://ieee-dataport.org/documents/instance-space-analysis-search-based-software-testing
    Explore at:
    Dataset updated
    Feb 17, 2022
    Authors
    Neelofar Neelofar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    however

  16. Z

    Fraunhofer EZRT XXL-CT Instance Segmentation Me163

    • data.niaid.nih.gov
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fuchs, Theobald (2024). Fraunhofer EZRT XXL-CT Instance Segmentation Me163 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7446803
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    Gerth, Stefan
    Fuchs, Theobald
    Böhnel, Michael
    Hempfer, Andreas
    Salamon, Michael
    Thomas Wittenberg
    Reims, Nils
    Gruber Roland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ’Me 163’ was a Second World War fighter airplane and a result of the German air force secret developments. One of these airplanes is currently owned and displayed in the historic aircraft exhibition of the ’Deutsches Museum’ in Munich, Germany. To gain insights with respect to its history, design and state of preservation, a complete CT scan was obtained using an industrial XXL-computer tomography scanner.Using the CT data from the Me 163, all its details can visually be examined at various levels, ranging from the complete hull down to single sprockets and rivets. However, while a trained human observer can identify and interpret the volumetric data with all its parts and connections, a virtual dissection of the airplane and all its different parts would be quite desirable. Nevertheless, this means, that an instance segmentation of all components and objects of interest into disjoint entities from the CT data is necessary.

    As of currently, no adequate computer-assisted tools for automated or semi-automated segmentationof such XXL-airplane data are available, in a first step, an interactive data annotation and object labeling process has been established. So far, seven sub-volumes from the Me 163 airplane have been annotated and labeled, whose results can potentially be used for various new applications in the field of digital heritage, non-destructive testing, or machine-learning. These annotated and labeled data sets are available here.

  17. Z

    Data from: DeepScoresV2

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satyawan, Yvan Putra (2023). DeepScoresV2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4012192
    Explore at:
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Schmidhuber, Jürgen
    Satyawan, Yvan Putra
    Stadelmann, Thilo
    Pacha, Alexander
    Tuggener, Lukas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The DeepScoresV2 Dataset for Music Object Detection contains digitally rendered images of written sheet music, together with the corresponding ground truth to fit various types of machine learning models. A total of 151 Million different instances of music symbols, belonging to 135 different classes are annotated. The total Dataset contains 255,385 Images. For most researches, the dense version, containing 1714 of the most diverse and interesting images, should suffice.

    The dataset contains ground in the form of:

    Non-oriented bounding boxes

    Oriented bounding boxes

    Semantic segmentation

    Instance segmentation

    The accompaning paper The DeepScoresV2 Dataset and Benchmark for Music Object Detection published at ICPR2020 can be found here:

    https://digitalcollection.zhaw.ch/handle/11475/20647

    A toolkit for convenient loading and inspection of the data can be found here:

    https://github.com/yvan674/obb_anns

    Code to train baseline models can be found here:

    https://github.com/tuggeluk/mmdetection/tree/DSV2_Baseline_FasterRCNN

    https://github.com/tuggeluk/DeepWatershedDetection/tree/dwd_old

  18. Furniture BBox To Segmentation (SAM)

    • kaggle.com
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolaas Regnier (2023). Furniture BBox To Segmentation (SAM) [Dataset]. https://www.kaggle.com/datasets/nicolaasregnier/furniture/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nicolaas Regnier
    Description

    This dataset is a subset (test images) of a larger dataset from a roboflow dataset(https://universe.roboflow.com/minoj-selvaraj/furniture-sfocl). The dataset contains images of chairs and sofas and labels in YOLO format of their bounding boxes.

    The purpose of this dataset is to build an infrastructure to minimize time drawing segmentations. This dataset hopes to provide code that we can apply SAM (https://segment-anything.com/) to images with bounding box and receive segmentation annotations that can be used for training or used to reduce time while annotating.

    I made a dataset class that handles the SAM (https://github.com/regs08/my_SAM/tree/master/Dataset)predicting as well as some plotting and analyzing.

    The dataset contains the training data under Original Images the sam-preds are the output I got when I applied SAM, see notebooks attached, and if you want to train the instance segmentation data look no further than sam_preds_training_set, which contains train/val/test sets.

  19. Quality Performance Warehouse Staging Instance

    • catalog.data.gov
    • data.wu.ac.at
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Quality Performance Warehouse Staging Instance [Dataset]. https://catalog.data.gov/dataset/quality-performance-warehouse-staging-instance
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    Social Security Administrationhttp://www.ssa.gov/
    Description

    Staging instance utilized to capture and report consistently on quality performance management data and to establish a common point for managing quality and performance.

  20. R

    Cv Project 4 C Dataset

    • universe.roboflow.com
    zip
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cvproject (2024). Cv Project 4 C Dataset [Dataset]. https://universe.roboflow.com/cvproject-d8hm5/cv-project-4-c/model/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 4, 2024
    Dataset authored and provided by
    Cvproject
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cattle Polygons
    Description

    Dataset Description:

    The dataset consists of 6570 grayscale images, meticulously handpicked and curated for instance segmentation tasks. These images have been meticulously annotated to delineate individual object instances, providing a comprehensive dataset for training and evaluating instance segmentation models.

    Data Collection Process:

    The images within the dataset were collected through a rigorous process involving multiple sources and datasets. Leveraging the capabilities of Roboflow Universe, the team behind the project meticulously handpicked images from various publicly available sources and datasets relevant to the domain of interest. These sources may include online repositories, research datasets, and proprietary collections, ensuring a diverse and representative sample of data.

    Preprocessing and Data Integration:

    To ensure uniformity and consistency across the dataset, several preprocessing techniques were applied. First, the images were automatically oriented to correct any orientation discrepancies. Next, they were resized to a standardized resolution of 640x640 pixels, facilitating efficient training and inference. Moreover, to simplify the data and focus on the essential features, the images were converted to grayscale.

    Furthermore, to augment the dataset and enhance its diversity, multiple datasets were combined and integrated into a single cohesive collection. This involved harmonizing annotation formats, resolving potential conflicts, and ensuring compatibility across different datasets. Through meticulous preprocessing and integration efforts, disparate datasets were seamlessly merged into a unified dataset, enriching its variability and ensuring comprehensive coverage of object instances and scenarios.

    Model Details:

    The instance segmentation model deployed for this dataset is built upon Roboflow 3.0 architecture, leveraging the Fast variant for efficient inference. Trained using the COCO instance segmentation dataset as its checkpoint, the model exhibits robust performance in accurately delineating object boundaries and classifying instances within the images.

    Performance Metrics:

    The model achieves impressive performance metrics, including a mAP of 76.5%, precision of 76.7%, and recall of 73.5%. These metrics underscore the model's effectiveness in accurately localizing and classifying object instances, demonstrating its suitability for various computer vision tasks.

    Conclusion:

    In summary, the dataset represents a culmination of meticulous data collection, preprocessing, and integration efforts, resulting in a comprehensive resource for instance segmentation tasks. By combining multiple datasets and leveraging advanced preprocessing techniques, the dataset offers diverse and representative imagery, enabling robust model training and evaluation. With the high-performance instance segmentation model and impressive performance metrics, the dataset serves as a valuable asset for researchers, developers, and practitioners in the field of computer vision.

    For further information and access to the dataset, please visit Roboflow Universe.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). INSTANCE Dataset [Dataset]. https://paperswithcode.com/dataset/instance

INSTANCE Dataset

the Italian seismic dataset for machine learning

Explore at:
Dataset updated
Nov 29, 2021
Description

INSTANCE is a data collection of more than 1.3 million seismic waveforms originating from a selection of about 54,000 earthquakes occurred since 2005 in Italy and surrounding regions and seismic noise recordings randomly extracted from event free time windows of the continuous waveforms archive. The purpose is to provide reference datasets useful to develop and test seismic data processing routines based on machine learning and deep learning frameworks. The primary source of this information is ISIDe (Italian Seismological Instrumental and Parametric Data-Base) for earthquakes and the Italian node of EIDA (http://eida.ingv.it) for seismic data. All the waveforms have been sized to a 120 s window, preprocessed and resampled at 100 Hz. For each trace we provide a large number of parameters as metadata, either derived from event information or computed from trace data. Associated metadata allow for the identification of the source, the station, the path travelled by seismic waves and assessment of the trace quality. The total size of the data collection is about 330 GB. Waveforms files are available either in counts or ground motion units in hdf5 format to facilitate fast access from commonly used machine learning frameworks.

Search
Clear search
Close search
Google apps
Main menu