32 datasets found
  1. h

    ADE20K

    • huggingface.co
    • datasetninja.com
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laureηt Fainsin, ADE20K [Dataset]. https://huggingface.co/datasets/1aurent/ADE20K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Laureηt Fainsin
    License

    https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/

    Description

    ADE20K Dataset

      Description
    

    ADE20K is composed of more than 27K images from the SUN and Places databases. Images are fully annotated with objects, spanning over 3K object categories. Many of the images also contain object parts, and parts of parts. We also provide the original annotated polygons, as well as object instances for amodal segmentation. Images are also anonymized, blurring faces and license plates.

      Images
    

    MIT, CSAIL does not own the copyright of the… See the full description on the dataset page: https://huggingface.co/datasets/1aurent/ADE20K.

  2. R

    Ade20k Dataset Part 3 Dataset

    • universe.roboflow.com
    zip
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ade20k dataset (2024). Ade20k Dataset Part 3 Dataset [Dataset]. https://universe.roboflow.com/ade20k-dataset/ade20k-dataset-part-3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 13, 2024
    Dataset authored and provided by
    Ade20k dataset
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Wall Polygons
    Description

    Ade20k Dataset Part 3

    ## Overview
    
    Ade20k Dataset Part 3 is a dataset for instance segmentation tasks - it contains Wall annotations for 7,212 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. h

    ade20k

    • huggingface.co
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masoud KA (2025). ade20k [Dataset]. https://huggingface.co/datasets/msdkhairi/ade20k
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Masoud KA
    Description

    msdkhairi/ade20k dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. R

    Ade20k Dataset V5.0.1 Dataset

    • universe.roboflow.com
    zip
    Updated Mar 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ankit (2024). Ade20k Dataset V5.0.1 Dataset [Dataset]. https://universe.roboflow.com/ankit-qd32q/ade20k-dataset-v5.0.1/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset authored and provided by
    ankit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Wall KZSU Polygons
    Description

    Ade20k Dataset V5.0.1

    ## Overview
    
    Ade20k Dataset V5.0.1 is a dataset for instance segmentation tasks - it contains Wall KZSU annotations for 6,442 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  5. f

    Example of classes included in the ADE20K dataset [25].

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Takahiro Oga; Ryosuke Harakawa; Sayaka Minewaki; Yo Umeki; Yoko Matsuda; Masahiro Iwahashi (2023). Example of classes included in the ADE20K dataset [25]. [Dataset]. http://doi.org/10.1371/journal.pone.0243073.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Takahiro Oga; Ryosuke Harakawa; Sayaka Minewaki; Yo Umeki; Yoko Matsuda; Masahiro Iwahashi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of classes included in the ADE20K dataset [25].

  6. h

    ade20k

    • huggingface.co
    Updated Dec 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ade20k [Dataset]. https://huggingface.co/datasets/MewtwoX23/ade20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2024
    Authors
    mewtwo
    Description

    MewtwoX23/ade20k dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. t

    Zheng, Heliang, Fu, Jianlong, Zha, Zheng-Jun, Luo, Jiebo, Mei, Tao (2024)....

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Zheng, Heliang, Fu, Jianlong, Zha, Zheng-Jun, Luo, Jiebo, Mei, Tao (2024). Dataset: ADE20K dataset for semantic segmentation. https://doi.org/10.57702/ild2772z [Dataset]. https://service.tib.eu/ldmservice/dataset/ade20k-dataset-for-semantic-segmentation
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in the paper is ADE20K for semantic segmentation.

  8. ADE20k

    • kaggle.com
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhjyot (2025). ADE20k [Dataset]. https://www.kaggle.com/datasets/shubhjyot/ade20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shubhjyot
    Description

    Dataset

    This dataset was created by Shubhjyot

    Contents

  9. h

    ade20k-nano

    • huggingface.co
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ade20k-nano [Dataset]. https://huggingface.co/datasets/qubvel-hf/ade20k-nano
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2024
    Authors
    Pavel Iakubovskii
    Description

    qubvel-hf/ade20k-nano dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    ade20k-panoptic-demo-imagefolder

    • huggingface.co
    Updated Dec 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niels Rogge (2022). ade20k-panoptic-demo-imagefolder [Dataset]. https://huggingface.co/datasets/nielsr/ade20k-panoptic-demo-imagefolder
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 24, 2022
    Authors
    Niels Rogge
    Description

    Dataset Card for "ade20k-panoptic-demo-imagefolder"

    More Information needed

  11. h

    ade20k-fix

    • huggingface.co
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ade20k-fix [Dataset]. https://huggingface.co/datasets/inpaint-context/ade20k-fix
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2024
    Dataset authored and provided by
    inpaint-context
    Description

    inpaint-context/ade20k-fix dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. o

    Trained Model for DeepLabV3Plus for MLPerf Mobile Inference

    • explore.openaire.eu
    Updated Jul 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ulia Tseng; Yu-Syuan Xu; Jimmy Chiang (2020). Trained Model for DeepLabV3Plus for MLPerf Mobile Inference [Dataset]. http://doi.org/10.5281/zenodo.3966762
    Explore at:
    Dataset updated
    Jul 30, 2020
    Authors
    Ulia Tseng; Yu-Syuan Xu; Jimmy Chiang
    Description

    Application: Semantic Segmentation ML Task: DeepLabV3Plus Framework: TensorFlow 1.15 for training/ckpt, pb evaluation; Tensorflow 2.2 for post-training quantization/tflite evaluation Training Information: Quality: Precision: FP32/INT8 Is Quantized: yes Is ONNX: no Dataset: ADE20K

  13. h

    ADE20K

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jia (2024). ADE20K [Dataset]. https://huggingface.co/datasets/Taoyang1/ADE20K
    Explore at:
    Dataset updated
    Sep 12, 2024
    Authors
    Jia
    Description

    Taoyang1/ADE20K dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. t

    Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio...

    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba (2024). Dataset: ADE20K: A Dataset for Semantic Segmentation. https://doi.org/10.57702/cc046a6u [Dataset]. https://service.tib.eu/ldmservice/dataset/ade20k--a-dataset-for-semantic-segmentation
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    ADE20K: A Dataset for Semantic Segmentation

  15. h

    ADE20K

    • huggingface.co
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Su (2025). ADE20K [Dataset]. https://huggingface.co/datasets/ranksu/ADE20K
    Explore at:
    Dataset updated
    Mar 30, 2025
    Authors
    Zhenfeng Su
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ranksu/ADE20K dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. Labeled high-resolution orthoimagery time-series of an alluvial river...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Buscombe; Daniel Buscombe (2023). Labeled high-resolution orthoimagery time-series of an alluvial river corridor; Elwha River, Washington, USA. [Dataset]. http://doi.org/10.5281/zenodo.10155783
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Buscombe; Daniel Buscombe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 18, 2023
    Area covered
    Washington, United States, Elwha River
    Description

    Labeled high-resolution orthoimagery time-series of an alluvial river corridor; Elwha River, Washington, USA.

    Daniel Buscombe, Marda Science LLC

    There are two datasets in this data release:

    1. Model training dataset. A manually (or semi-manually) labeled image dataset that was used to train and evaluate a machine (deep) learning model designed to identify subaerial accumulations of large wood, alluvial sediment, water, and vegetation in orthoimagery of alluvial river corridors in forested catchments.

    2. Model output dataset. A labeled image dataset that uses the aforementioned model to estimate subaerial accumulations of large wood, alluvial sediment, water, and vegetation in a larger orthoimagery dataset of alluvial river corridors in forested catchments.

    All of these label data are derived from raw gridded data that originate from the U.S. Geological Survey (Ritchie et al., 2018). That dataset consists of 14 orthoimages of the Middle Reach (MR, in between the former Aldwell and Mills reservoirs) and 14 corresponding Lower Reach (LR, downstream of the former Mills reservoir) of the Elwha River, Washington, collected between the period 2012-04-07 and 2017-09-22. That orthoimagery was generated using SfM photogrammetry (following Over et al., 2021) using a photographic camera mounted to an aircraft wing. The imagery capture channel change as it evolved under a ~20 Mt sediment pulse initiated by the removal of the two dams. The two reaches are the ~8 km long Middle Reach (MR) and the lower-gradient ~7 km long Lower Reach (LR).

    The orthoimagery have been labeled (pixelwise, either manually or by an automated process) according to the following classes (inter class in the label data in parentheses):

    1. vegetation / other (0)

    2. water (1)

    3. sediment (2)

    4. large wood (3)

    1. Model training dataset.

    Imagery was labeled using a combination of the open-source software Doodler (Buscombe et al., 2021; https://github.com/Doodleverse/dash_doodler) and hand-digitization using QGIS at 1:300 scale, rasterizeing the polygons, and gridded and clipped in the same way as all other gridded data. Doodler facilitates relatively labor-free dense multiclass labeling of natural imagery, enabling relatively rapid training dataset creation. The final training dataset consists of 4382 images and corresponding labels, each 1024 x 1024 pixels and representing just over 5% of the total data set. The training data are sampled approximately equally in time and in space among both reaches. All training and validation samples purposefully included all four label classes, to avoid model training and evaluation problems associated with class imbalance (Buscombe and Goldstein, 2022).

    Data are provided in geoTIFF format. The imagery and label grids (imagery) are reprojected to be co-located in the NAD83(2011) / UTM zone 10N projection, and to consist of 0.125 x 0.125m pixels.

    Pixel-wise labels measurements such as these facilitate development and evaluation of image segmentation, image classification, object-based image-analysis (OBIA), and object-in-image detection models, and numerous potential other machine learning models for the general purposes of river corridor classification, description, enumeration, inventory, and process or state quantification. For example this dataset may serve in transfer learning contexts for application in different river or coastal environments or for different tasks or class ontologies.

    Files:

    1. Labels_used_for_model_training_Buscombe_Labeled_high_resolution_orthoimagery_time_series_of_an_alluvial_river_corridor_Elwha_River_Washington_USA.zip, 63 MB, label tiffs

    2. Model_training_ images1of4.zip, 1.5 GB, imagery tiffs

    3. Model_training_ images2of4.zip, 1.5 GB, imagery tiffs

    4. Model_training_ images3of4.zip, 1.7 GB, imagery tiffs

    5. Model_training_ images4of4.zip, 1.6 GB, imagery tiffs

    2. Model output dataset.

    Imagery was labeled using a deep-learning based semantic segmentation model (Buscombe, 2023) trained specifically for the task using the Segmentation Gym (Buscombe and Goldstein, 2022) modeling suite. We use the software package Segmentation Gym (Buscombe and Goldstein, 2022) to fine-tune a Segformer (Xie et al., 2021) deep learning model for semantic image segmentation. We take the instance (i.e. model architecture and trained weights) of the model of Xie et al. (2021), itself fine-tuned on ADE20k dataset (Zhou et al., 2019) at resolution 512x512 pixels, and fine-tune it on our 1024x1024 pixel training data consisting of 4-class label images.

    The spatial extent of the imagery in the MR is 455157.2494695878122002,5316532.9804129302501678 : 457076.1244695878122002,5323771.7304129302501678. Imagery width is 15351 pixels and imagery height is 57910 pixels. The spatial extent of the imagery in the LR is 457704.9227139975992031,5326631.3750646486878395 : 459241.6727139975992031,5333311.0000646486878395. Imagery width is 12294 pixels and imagery height is 53437 pixels. Data are provided in Cloud-Optimzed geoTIFF (COG) format. The imagery and label grids (imagery) are reprojected to be co-located in the NAD83(2011) / UTM zone 10N projection, and to consist of 0.125 x 0.125m pixels. All grids have been clipped to the union of extents of active channel margins during the period of interest.

    Reach-wide pixel-wise measurements such as these facilitate comparison of wood and sediment storage at any scale or location. These data may be useful for studying the morphodynamics of wood-sediment interactions in other geomorphically complex channels, wood storage in channels, the role of wood in ecosystems and conservation or restoration efforts.

    Files:

    1. Elwha_MR_labels_Buscombe_Labeled_high_resolution_orthoimagery_time_series_of_an_alluvial_river_corridor_Elwha_River_Washington_USA.zip, 9.67 MB, label COGs from Elwha River Middle Reach (MR)

    2. ElwhaMR_ imagery_ part1_ of 2.zip, 566 MB, imagery COGs from Elwha River Middle Reach (MR)

    3. ElwhaMR imagery_ part2_ of_ 2.zip, 618 MB, imagery COGs from Elwha River Middle Reach (MR)

    3. Elwha_LR_labels_Buscombe_Labeled_high_resolution_orthoimagery_time_series_of_an_alluvial_river_corridor_Elwha_River_Washington_USA.zip, 10.96 MB, label COGs from Elwha River Lower Reach (LR)

    4. ElwhaLR_ imagery_ part1_ of 2.zip, 622 MB, imagery COGs from Elwha River Middle Reach (MR)

    5. ElwhaLR imagery_ part2_ of_ 2.zip, 617 MB, imagery COGs from Elwha River Middle Reach (MR)

    This dataset was created using open-source tools of the Doodleverse, a software ecosystem for geoscientific image segmentation, by Daniel Buscombe (https://github.com/dbuscombe-usgs) and Evan Goldstein (https://github.com/ebgoldstein). Thanks to the contributors of the Doodleverse!. Thanks especially Sharon Fitzpatrick (https://github.com/2320sharon) and Jaycee Favela for contributing labels.

    References

    • Buscombe, D. (2023). Doodleverse/Segmentation Gym SegFormer models for 4-class (other, water, sediment, wood) segmentation of RGB aerial orthomosaic imagery (v1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8172858

    • Buscombe, D., Goldstein, E. B., Sherwood, C. R., Bodine, C., Brown, J. A., Favela, J., et al. (2021). Human-in-the-loop segmentation of Earth surface imagery. Earth and Space Science, 9, e2021EA002085. https://doi.org/10.1029/2021EA002085

    • Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym

    • Over, J.R., Ritchie, A.C., Kranenburg, C.J., Brown, J.A., Buscombe, D., Noble, T., Sherwood, C.R., Warrick, J.A., and Wernette, P.A., 2021, Processing coastal imagery with Agisoft Metashape Professional Edition, version 1.6—Structure from motion workflow documentation: U.S. Geological Survey Open-File Report 2021–1039, 46 p., https://doi.org/10.3133/ofr20211039.

    • Ritchie, A.C., Curran, C.A., Magirl, C.S., Bountry, J.A., Hilldale, R.C., Randle, T.J., and Duda, J.J., 2018, Data in support of 5-year sediment budget and morphodynamic analysis of Elwha River following dam removals: U.S. Geological Survey data release, https://doi.org/10.5066/F7PG1QWC.

    • Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M. and Luo, P., 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34, pp.12077-12090.

    • Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A. and Torralba, A., 2019. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127, pp.302-321.


  17. MRL_Seq123_withSegmentation

    • zenodo.org
    bin
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Gaspar; Laura Gaspar (2024). MRL_Seq123_withSegmentation [Dataset]. http://doi.org/10.5281/zenodo.13760574
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laura Gaspar; Laura Gaspar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here is a set of three sequences recorded at the Mobile Robotic Laboratory (MRL) in ISR. The first bag is set to run up to 130 seconds, as a failure occurred after that point, but everything is correct until then. Along with sequence 1, there is a bag containing the semantic segmentation, obtained using the PSPNet network with the ADE20K dataset, with an opacity of 255. However, the filter.bag contains only part of the classes, while the all.bag document contains all the classes. Additionally, there are two more sequences with similar trajectories. The data includes recordings from the RGBD camera (D435i), wheel odometry, stereo-camera (Mynt Eye S1030), and 2D LiDAR (Hokuyo URG-04L). At this stage, the Mynt Eye did not yet have the IMU correction or the corrected right camera info values.

    The transformations are defined as follows (approximately):

    RGB-D camera: 0.18 0.005 0.71 0 0.2007 0 base_link realsense_link
    Mynt Eye camera: 0.205 0.0 0.63 -1.57 0 -1.72 base_link mynteye_link

    It should be noted that the laser transformations have already been recorded on the bag. The segmented images are in BGR format.

  18. result

    • figshare.com
    application/x-rar
    Updated Dec 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    huiyu kuang (2021). result [Dataset]. http://doi.org/10.6084/m9.figshare.17429309.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Dec 23, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    huiyu kuang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is ade20k Datasets after semantic segmentation

  19. P

    CV-Bench Dataset

    • library.toponeai.link
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengbang Tong; Ellis Brown; Penghao Wu; Sanghyun Woo; Manoj Middepogu; Sai Charitha Akula; Jihan Yang; Shusheng Yang; Adithya Iyer; Xichen Pan; Ziteng Wang; Rob Fergus; Yann Lecun; Saining Xie (2024). CV-Bench Dataset [Dataset]. https://library.toponeai.link/dataset/cv-bench
    Explore at:
    Dataset updated
    Jun 25, 2024
    Authors
    Shengbang Tong; Ellis Brown; Penghao Wu; Sanghyun Woo; Manoj Middepogu; Sai Charitha Akula; Jihan Yang; Shusheng Yang; Adithya Iyer; Xichen Pan; Ziteng Wang; Rob Fergus; Yann Lecun; Saining Xie
    Description

    The Cambrian Vision-Centric Benchmark (CV-Bench) is designed to address the limitations of existing vision-centric benchmarks by providing a comprehensive evaluation framework for multimodal large language models (MLLMs). With 2,638 manually-inspected examples, CV-Bench significantly surpasses other vision-centric MLLM benchmarks, offering 3.5 times more examples than RealWorldQA and 8.8 times more than MMVP.

    Motivation and Content Summary:

    CV-Bench repurposes standard vision benchmarks such as ADE20K, COCO, and Omni3D to assess models on classic vision tasks within a multimodal context. Leveraging the rich ground truth annotations from these benchmarks, natural language questions are formulated to probe the fundamental 2D and 3D understanding of models.

    Potential Use Cases:

    Evaluating the spatial relationship and object counting capabilities of models (2D understanding). Assessing the depth order and relative distance understanding of models (3D understanding). Benchmarking the performance of multimodal models in both vision-specific and cross-modal tasks.

    Dataset Characteristics:

    2D Understanding Tasks: Spatial Relationship: Determine the relative position of an object with respect to the anchor object, considering left-right or top-bottom relationships.

    Object Count: Determine the number of instances present in the image.

    3D Understanding Tasks:

    Depth Order: Determine which of the two distinct objects is closer to the camera. Relative Distance: Determine which of the two distinct objects is closer to the anchor object.

    TypeTaskDescriptionSources# Samples
    2DSpatial RelationshipDetermine the relative position of an object w.r.t. the anchor object.ADE20K, COCO650
    2DObject CountDetermine the number of instances present in the image.ADE20K, COCO788
    3DDepth OrderDetermine which of the two distinct objects is closer to the camera.Omni3D600
    3DRelative DistanceDetermine which of the two distinct objects is closer to the anchor object.Omni3D600

    Curation Process:

    Questions for each task are programmatically constructed and then manually inspected to ensure clarity and accuracy. Any unclear, ambiguous, or erroneous questions are removed to maintain the benchmark's reliability.

  20. O

    MSeg

    • opendatalab.com
    zip
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intel Labs (2023). MSeg [Dataset]. https://opendatalab.com/OpenDataLab/MSeg
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    Intel Labs
    University of California, Berkeley
    Georgia Institute of Technology
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MSeg is a composite dataset for multi-domain semantic segmentation. This dataset unifies semantic segmentation datasets from different domains: COCO, ADE20K, Mapillary, IDD, BDD, Cityscapes, and SUN RGB-D. By harmonizing classifications, merging and splitting classifications, a unified classification of 194 categories was obtained. In order to make the pixel-level annotations conform to a unified taxonomy, the authors conduct a large-scale annotation work through the Mechanical Turk platform, and generate compatible annotations in the dataset by relabeling object masks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Laureηt Fainsin, ADE20K [Dataset]. https://huggingface.co/datasets/1aurent/ADE20K

ADE20K

ADE20K

1aurent/ADE20K

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Laureηt Fainsin
License

https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/

Description

ADE20K Dataset

  Description

ADE20K is composed of more than 27K images from the SUN and Places databases. Images are fully annotated with objects, spanning over 3K object categories. Many of the images also contain object parts, and parts of parts. We also provide the original annotated polygons, as well as object instances for amodal segmentation. Images are also anonymized, blurring faces and license plates.

  Images

MIT, CSAIL does not own the copyright of the… See the full description on the dataset page: https://huggingface.co/datasets/1aurent/ADE20K.

Search
Clear search
Close search
Google apps
Main menu