Facebook
TwitterCityscapes data (dataset home page) contains labeled videos taken from vehicles driven in Germany. This version is a processed subsample created as part of the Pix2Pix paper. The dataset has still images from the original videos, and the semantic segmentation labels are shown in images alongside the original image. This is one of the best datasets around for semantic segmentation tasks.
This dataset is the same as what is available here from the Berkeley AI Research group.
The Cityscapes data available from cityscapes-dataset.com has the following license:
This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:
That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Daimler AG, MPI Informatics, TU Darmstadt) do not accept any responsibility for errors or omissions. That you include a reference to the Cityscapes Dataset in any work that makes use of the dataset. For research papers, cite our preferred publication as listed on our website; for other media cite our preferred publication as listed on our website or link to the Cityscapes website. That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain. That all rights not expressly granted to you are reserved by (Daimler AG, MPI Informatics, TU Darmstadt).
Can you identify you identify what objects are where in these images from a vehicle.
Facebook
TwitterThis dataset is a preprocessed dataset of the City Scapes dataset, to be used for two tasks: Depth Estimation and Semantic Segmentation.
The dataset contains 128 x 256 sized images, their 19 class semantic segmentation labels and inverse depth labels.
The original dataset is taken from this website and the preprocessed ones are taken from this website.
The Cityscapes data available from cityscapes-dataset.com has the following license:
This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
Liu, Shikun and Johns, Edward and Davison, Andrew J, "End-to-End Multi-task Learning with Attention" in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Effect of additional modules on segmentation performance: Ablation study results in Cityscapes dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of semantic segmentation methods on Cityscapes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-domain object detection is a key problem in the research of intelligent detection models. Different from lots of improved algorithms based on two-stage detection models, we try another way. A simple and efficient one-stage model is introduced in this paper, comprehensively considering the inference efficiency and detection precision, and expanding the scope of undertaking cross-domain object detection problems. We name this gradient reverse layer-based model YOLO-G, which greatly improves the object detection precision in cross-domain scenarios. Specifically, we add a feature alignment branch following the backbone, where the gradient reverse layer and a classifier are attached. With only a small increase in computational, the performance is higher enhanced. Experiments such as Cityscapes→Foggy Cityscapes, SIM10k→Cityscape, PASCAL VOC→Clipart, and so on, indicate that compared with most state-of-the-art (SOTA) algorithms, the proposed model achieves much better mean Average Precision (mAP). Furthermore, ablation experiments were also performed on 4 components to confirm the reliability of the model. The project is available at https://github.com/airy975924806/yolo-G.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC00001002525https://ega-archive.org/dacs/EGAC00001002525
Source data of clinical study data corresponding to figures reported in the paper titled: Anti-TIGIT antibody improves PD-L1 blockade through myeloid and Treg cells. PMID: 38418879 DOI: 10.1038/s41586-024-07121-9
Facebook
TwitterIntroduction
This is the dataset for pix2pix model which aims to work as a general-purpose solution for image-to-image translation problems.
Due to Kaggle's size limitations, only 4 datasets are available here.
1 more dataset (Edges to handbags) and can be downloaded from the link provided in the sources section.
Common tasks
More description of the actual model, some implementations, and all the community contributions can be found on the author's GitHub project page here: https://phillipi.github.io/pix2pix/
Sources
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-domain object detection is a key problem in the research of intelligent detection models. Different from lots of improved algorithms based on two-stage detection models, we try another way. A simple and efficient one-stage model is introduced in this paper, comprehensively considering the inference efficiency and detection precision, and expanding the scope of undertaking cross-domain object detection problems. We name this gradient reverse layer-based model YOLO-G, which greatly improves the object detection precision in cross-domain scenarios. Specifically, we add a feature alignment branch following the backbone, where the gradient reverse layer and a classifier are attached. With only a small increase in computational, the performance is higher enhanced. Experiments such as Cityscapes→Foggy Cityscapes, SIM10k→Cityscape, PASCAL VOC→Clipart, and so on, indicate that compared with most state-of-the-art (SOTA) algorithms, the proposed model achieves much better mean Average Precision (mAP). Furthermore, ablation experiments were also performed on 4 components to confirm the reliability of the model. The project is available at https://github.com/airy975924806/yolo-G.
Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
The Coralscapes dataset is the first general-purpose dense semantic segmentation dataset for coral reefs. Similar in scope and with the same structure as the widely used Cityscapes dataset for urban scene understanding, Coralscapes allows for the benchmarking of semantic segmentation models in a new challenging domain. The Coralscapes dataset spans 2075 images at 1024×2048px resolution gathered from 35 dive sites in 5 countries in the Red Sea, labeled in a consistent and speculation-free manner containing 174k polygons over 39 benthic classes.
This repository provides a collection of scripts and instructions for working with the Coralscapes dataset. It includes the full codebase necessary for training and evaluating models on this dataset, allowing to reproduce the results in the paper. Additionally, it contains scripts and step-by-step guidance on how to use the trained models for inference and how to fine-tune the models to external datasets.
The dataset structure of the Coralscapes dataset follows the structure of the Cityscapes dataset:
{root}/{type}/{split}/{site}/{site}_{seq:0>6}_{frame:0>6}_{type}{ext}
The meaning of the individual elements is:
root the root folder of the Coralscapes dataset.type the type/modality of data, gtFine for fine ground truth, leftImg8bit for left 8-bit images, leftImg8bit_1080p (gtFine_1080p) for the images (ground truth) in 1080p resolution, leftImg8bit_videoframes for the 19 preceding and 10 trailing video frames.split the split, i.e. train/val/test. Note that not all kinds of data exist for all splits. Thus, do not be surprised to occasionally find empty folders.site ID of the site in which this part of the dataset was recorded.seq the sequence number using 6 digits.frame the frame number using 6 digits.ext .png
The files provided in the Zenodo repository are the following:
coralscapes.7z contains the Coralscapes dataset which includes the 2075 images and corresponding ground truth semantic segmentation masks at 1024x2048px resolution.coralscapes_1080p.7z contains the Coralscapes images and masks in their native 1080x1920px resolution.model_checkpoints.7z contains the checkpoints of the semantic segmentation models that have been fine-tuned on the Coralscapes dataset. This includes the following models: SegFormer (with a B2 and B5 backbone, trained with and without LoRA), DPT (with a DINOv2-Base and DINOv2-Giant backbone, trained with and without LoRA), a Linear segmentation model with a DINOv2-Base backbone, a UNet++ with a ResNet50 backbone and DeepLabV3+ with a ResNet50 backbone. coralscapes_videoframes.7z contains the the 19 preceding and 10 trailing video frames of each image in the Coralscapes dataset.
Facebook
Twitter{"Datasets and Jupyter notebook corresponding to the paper "Infrastructure-scale sustainable energy planning in the cityscape: Transforming urban energy metabolism in East Asia" published in WIREs Energy and Environment. See the Jupyter Notebook for additional explanations about the datasets."}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance metrics of our Sim2Real transfer model in different datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper presents a novel method for improving semantic segmentation performance in computer vision tasks. Our approach utilizes an enhanced UNet architecture that leverages an improved ResNet50 backbone. We replace the last layer of ResNet50 with deformable convolution to enhance feature representation. Additionally, we incorporate an attention mechanism, specifically ECA-ASPP (Attention Spatial Pyramid Pooling), in the encoding path of UNet to capture multi-scale contextual information effectively. In the decoding path of UNet, we explore the use of attention mechanisms after concatenating low-level features with high-level features. Specifically, we investigate two types of attention mechanisms: ECA (Efficient Channel Attention) and LKA (Large Kernel Attention). Our experiments demonstrate that incorporating attention after concatenation improves segmentation accuracy. Furthermore, we compare the performance of ECA and LKA modules in the decoder path. The results indicate that the LKA module outperforms the ECA module. This finding highlights the importance of exploring different attention mechanisms and their impact on segmentation performance. To evaluate the effectiveness of the proposed method, we conduct experiments on benchmark datasets, including Stanford and Cityscapes, as well as the newly introduced WildPASS and DensPASS datasets. Based on our experiments, the proposed method achieved state-of-the-art results including mIoU 85.79 and 82.25 for the Stanford dataset, and the Cityscapes dataset, respectively. The results demonstrate that our proposed method performs well on these datasets, achieving state-of-the-art results with high segmentation accuracy.
Facebook
TwitterThis dataset has been created as part of the Cam2BEV project. There, the datasets are used for the computation of a semantically segmented bird's eye view (BEV) image given the images of multiple vehicle-mounted cameras as presented in the paper:
A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View (arXiv)
Lennart Reiher, Bastian Lampe, and Lutz Eckstein
Institute for Automotive Engineering (ika), RWTH Aachen University
360° Surround Cameras
https://gitlab.ika.rwth-aachen.de/cam2bev/cam2bev-data/-/raw/master/1_FRLR/examples/front.png">
https://gitlab.ika.rwth-aachen.de/cam2bev/cam2bev-data/-/raw/master/1_FRLR/examples/rear.png">
https://gitlab.ika.rwth-aachen.de/cam2bev/cam2bev-data/-/raw/master/1_FRLR/examples/left.png">
https://gitlab.ika.rwth-aachen.de/cam2bev/cam2bev-data/-/raw/master/1_FRLR/examples/right.png">
https://gitlab.ika.rwth-aachen.de/cam2bev/cam2bev-data/-/raw/master/1_FRLR/examples/bev.png">
https://gitlab.ika.rwth-aachen.de/cam2bev/cam2bev-data/-/raw/master/1_FRLR/examples/bev+occlusion.png">
https://gitlab.ika.rwth-aachen.de/cam2bev/cam2bev-data/-/raw/master/1_FRLR/examples/homography.png">
| # Training Samples | # Validation Samples | # Vehicle Cameras | # Semantic Classes |
|---|---|---|---|
| 33199 | 3731 | 4 (front, rear, left, right) | 30 (CityScapes) |
Note: The CityScapes colors for semantic classes Pedestrian and Rider are switched due to technical reasons.
| Resolution (x,y) | Focal Length (x,y) | Principal Point (x,y) | Position (X,Y,Z) | Rotation (H, P, R) |
|---|---|---|---|---|
| 964, 604 | 278.283, 408.1295 | 482, 302 | 1.7, 0.0, 1.4 | 0.0, 0.0, 0.0 |
| Resolution (x,y) | Focal Length (x,y) | Principal Point (x,y) | Position (X,Y,Z) | Rotation (H, P, R) |
|---|---|---|---|---|
| 964, 604 | 278.283, 408.1295 | 482, 302 | -0.6, 0.0, 1.4 | 3.1415, 0.0, 0.0 |
| Resolution (x,y) | Focal Length (x,y) | Principal Point (x,y) | Position (X,Y,Z) | Rotation (H, P, R) |
|---|---|---|---|---|
| 964, 604 | 278.283, 408.1295 | 482, 302 | 0.5, 0.5, 1.5 | 1.5708, 0.0, 0.0 |
| Resolution (x,y) | Focal Length (x,y) | Principal Point (x,y) | Position (X,Y,Z) | Rotation (H, P, R) |
|---|---|---|---|---|
| 964, 604 | 278.283, 408.1295 | 482, 302 | 0.5, -0.5, 1.5 | -1.5708, 0.0, 0.0 |
| Resolution (x,y) | Focal Length (x,y) | Principal Point (x,y) | Position (X,Y,Z) | Rotation (H, P, R) |
|---|---|---|---|---|
| 964, 604 | 682.578, 682.578 | 482, 302 | 0.0, 0.0, 50.0 | 0.0, 1.5708, -1.5708 |
The original dataset is taken from this website.
The Cam2BEV data available from the corresponding website has the following license:
This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching or scientific publications. Permission is granted to use the data given that you agree:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine software and hardware configuration.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Concat refers to the use of concatenation operations to fuse the output of DAPM. SFM: Selective Fuse Module. SSFM is the addition of the Spatial Attention Mechanism to each of the two branches of SFM. FSFM refers to the Focusing Selective Fuse Module.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ASPP is Atrous spatial pyramid pooling. MASPP refers to replacing the 1 × 1 convolution layer of ASPP with a 3 × 3 depthwise separable convolution layer. DSPM is our Double Attention Pyramid Module.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental results of different methods on “Cityscape to Foggy” data set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ablation experimental results of the model on the “Cityscape to Foggy” dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data used in a paper that undertakes a comparative analysis of Kingston, Jamaica and Lagos, Nigeria, two historically rich, culturally vibrant, and environmentally challenged cities that embody the evolving tensions between indigenous aesthetics, colonial legacies, rapid urbanisation, and contemporary demands for sustainability.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterCityscapes data (dataset home page) contains labeled videos taken from vehicles driven in Germany. This version is a processed subsample created as part of the Pix2Pix paper. The dataset has still images from the original videos, and the semantic segmentation labels are shown in images alongside the original image. This is one of the best datasets around for semantic segmentation tasks.
This dataset is the same as what is available here from the Berkeley AI Research group.
The Cityscapes data available from cityscapes-dataset.com has the following license:
This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:
That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Daimler AG, MPI Informatics, TU Darmstadt) do not accept any responsibility for errors or omissions. That you include a reference to the Cityscapes Dataset in any work that makes use of the dataset. For research papers, cite our preferred publication as listed on our website; for other media cite our preferred publication as listed on our website or link to the Cityscapes website. That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain. That all rights not expressly granted to you are reserved by (Daimler AG, MPI Informatics, TU Darmstadt).
Can you identify you identify what objects are where in these images from a vehicle.