Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Today, deep neural networks are widely used in many computer vision problems, also for geographic information systems (GIS) data. This type of data is commonly used for urban analyzes and spatial planning. We used orthophotographic images of two residential districts from Kielce, Poland for research including urban sprawl automatic analysis with Transformer-based neural network application.Orthophotomaps were obtained from Kielce GIS portal. Then, the map was manually masked into building and building surroundings classes. Finally, the ortophotomap and corresponding classification mask were simultaneously divided into small tiles. This approach is common in image data preprocessing for machine learning algorithms learning phase. Data contains two original orthophotomaps from Wietrznia and Pod Telegrafem residential districts with corresponding masks and also their tiled version, ready to provide as a training data for machine learning models.Transformed-based neural network has undergone a training process on the Wietrznia dataset, targeted for semantic segmentation of the tiles into buildings and surroundings classes. After that, inference of the models was used to test model's generalization ability on the Pod Telegrafem dataset. The efficiency of the model was satisfying, so it can be used in automatic semantic building segmentation. Then, the process of dividing the images can be reversed and complete classification mask retrieved. This mask can be used for area of the buildings calculations and urban sprawl monitoring, if the research would be repeated for GIS data from wider time horizon.Since the dataset was collected from Kielce GIS portal, as the part of the Polish Main Office of Geodesy and Cartography data resource, it may be used only for non-profit and non-commertial purposes, in private or scientific applications, under the law "Ustawa z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2006 r. nr 90 poz 631 z późn. zm.)". There are no other legal or ethical considerations in reuse potential.Data information is presented below.wietrznia_2019.jpg - orthophotomap of Wietrznia districtmodel's - used for training, as an explanatory imagewietrznia_2019.png - classification mask of Wietrznia district - used for model's training, as a target imagewietrznia_2019_validation.jpg - one image from Wietrznia district - used for model's validation during training phasepod_telegrafem_2019.jpg - orthophotomap of Pod Telegrafem district - used for model's evaluation after training phasewietrznia_2019 - folder with wietrznia_2019.jpg (image) and wietrznia_2019.png (annotation) images, divided into 810 tiles (512 x 512 pixels each), tiles with no information were manually removed, so the training data would contain only informative tilestiles presented - used for the model during training (images and annotations for fitting the model to the data)wietrznia_2019_vaidation - folder with wietrznia_2019_validation.jpg image divided into 16 tiles (256 x 256 pixels each) - tiles were presented to the model during training (images for validation model's efficiency); it was not the part of the training datapod_telegrafem_2019 - folder with pod_telegrafem.jpg image divided into 196 tiles (256 x 265 pixels each) - tiles were presented to the model during inference (images for evaluation model's robustness)Dataset was created as described below.Firstly, the orthophotomaps were collected from Kielce Geoportal (https://gis.kielce.eu). Kielce Geoportal offers a .pst recent map from April 2019. It is an orthophotomap with a resolution of 5 x 5 pixels, constructed from a plane flight at 700 meters over ground height, taken with a camera for vertical photos. Downloading was done by WMS in open-source QGIS software (https://www.qgis.org), as a 1:500 scale map, then converted to a 1200 dpi PNG image.Secondly, the map from Wietrznia residential district was manually labelled, also in QGIS, in the same scope, as the orthophotomap. Annotation based on land cover map information was also obtained from Kielce Geoportal. There are two classes - residential building and surrounding. Second map, from Pod Telegrafem district was not annotated, since it was used in the testing phase and imitates situation, where there is no annotation for the new data presented to the model.Next, the images was converted to an RGB JPG images, and the annotation map was converted to 8-bit GRAY PNG image.Finally, Wietrznia data files were tiled to 512 x 512 pixels tiles, in Python PIL library. Tiles with no information or a relatively small amount of information (only white background or mostly white background) were manually removed. So, from the 29113 x 15938 pixels orthophotomap, only 810 tiles with corresponding annotations were left, ready to train the machine learning model for the semantic segmentation task. Pod Telegrafem orthophotomap was tiled with no manual removing, so from the 7168 x 7168 pixels ortophotomap were created 197 tiles with 256 x 256 pixels resolution. There was also image of one residential building, used for model's validation during training phase, it was not the part of the training data, but was a part of Wietrznia residential area. It was 2048 x 2048 pixel ortophotomap, tiled to 16 tiles 256 x 265 pixels each.
Facebook
Twitterhttps://market.us/privacy-policy/https://market.us/privacy-policy/
By 2034, the AI Annotation Market is expected to reach a valuation of USD 28.5 billion, expanding at a healthy CAGR of 28.6%
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The SkySeaLand Dataset is a high-resolution satellite imagery collection developed for object detection, classification, and aerial analysis tasks. It focuses on transportation-related objects observed from diverse geospatial contexts, offering precise YOLO-formatted annotations for four categories: airplane, boat, car, and ship.
This dataset bridges terrestrial, maritime, and aerial domains, providing a unified resource for developing and benchmarking computer vision models in complex real-world environments.
.txt file per image) The SkySeaLand Dataset is divided into the following subsets for training, validation, and testing:
This split ensures a balanced distribution for training, validating, and testing models, facilitating robust model evaluation and performance analysis.
| Class Name | Object Count |
|---|---|
| Airplane | 4,847 |
| Boat | 3,697 |
| Car | 6,932 |
| Ship | 3,627 |
The dataset maintains a moderately balanced distribution among categories, ensuring stable model performance during multi-class training and evaluation.
Each label file contains normalized bounding box annotations in YOLO format.
The format for each line is:
Where: - class_id: The class of the object (refer to the table below). - x_center, y_center: The center coordinates of the bounding box, normalized between 0 and 1 relative to the image width and height. - width, height: The width and height of the bounding box, also normalized between 0 and 1.
| Class ID | Category |
|---|---|
| 0 | Airplane |
| 1 | Boat |
| 2 | Car |
| 3 | Ship |
All coordinates are normalized between 0 and 1 relative to the image width and height.
Data Source:
- Satellite imagery was obtained from Google Earth Pro under fair-use and research guidelines.
- The dataset was prepared solely for academic and educational computer vision research.
Annotation Tools:
- Manual annotations were performed and verified using:
- CVAT (Computer Vision Annotation Tool)
- Roboflow
These tools were used to ensure consistent annotation quality and accurate bounding box placement across all object classes.
Facebook
TwitterSpatial prepositions have been studied in some detail from multiple disciplinary perspectives. However, neither the semantic similarity of these prepositions, nor the relationships between the multiple senses of different spatial prepositions, are well understood. In an empirical study of 24 spatial prepositions, we identify the degree and nature of semantic similarity and extract senses for three semantically similar groups of prepositions using t-SNE, DBSCAN clustering, and Venn diagrams. We validate the work by manual annotation with another data set. We find nuances in meaning among proximity and adjacency prepositions, such as the use of close to instead of near for pairs of lines, and the importance of proximity over contact for the next to preposition, in contrast to other adjacency prepositions.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Spatial expressions annotation guidelines describing the process of manual annotation of documents in Polish Corpus of Wrocław University of Technology (KPWr)
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 1127.4(USD Million) |
| MARKET SIZE 2025 | 1240.1(USD Million) |
| MARKET SIZE 2035 | 3200.0(USD Million) |
| SEGMENTS COVERED | Application, End Use, Service Type, Deployment Mode, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | Increasing demand for AI technologies, Growth of autonomous vehicles, Advancements in LiDAR technology, Rising need for geospatial data, Expansion in 3D modeling applications |
| MARKET FORECAST UNITS | USD Million |
| KEY COMPANIES PROFILED | TechniMeasure, Amazon Web Services, Pointivo, Landmark Solutions, Autodesk, NVIDIA, Pix4D, Hexagon, Intel Corporation, Microsoft Azure, Faro Technologies, Google Cloud, Siemens, 3D Systems, Matterport, CGG |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Increasing demand for autonomous vehicles, Growth in AI and machine learning, Expansion of smart city projects, Rise in 3D modeling applications, Development of augmented and virtual reality |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 10.0% (2025 - 2035) |
Facebook
TwitterThe samples in the dataset are connected to a study focusing on studying breast cancer intratumoral heterogeneity using spatial transcriptomic data and computational pathology. The dataset contains 14 samples from 3 patients (one triple negative breast cancer and two HER2-positive breast cancer). Multiple regions of the tumor were collected for analysis. Each sample is one tumor region from one of the patients.
Libraries for spatial transcriptomics were prepared using Visium spatial gene expression kits (10x genomics). Sequencing was performed using the Illumina NovaSeq 6000 platform at the National Genomics Infrastructure, SciLifeLab in Solna, Sweden.
The dataset contains 28 fastq files, compressed with GNUzip (gzip), from paired-end RNA sequencing (10X Visium spatial transcriptomics). The meta data is described in SND_metadata.xlsx file. The md5sum.txt file is provided for validation of data integrity. The total size of the dataset is approximately 300 GB.
Facebook
TwitterCoast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}{numberofclasses}{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes used to annotate the images, and {threedigitdatasetversion} is the three-digit code corresponding to the dataset version (in other words, 001 is version 1). Each zipped folder contains a collection of NPZ format files, each of which corresponds to an individual image. An individual NPZ file is named after the image that it represents and contains (1) a CSV file with detail information for every image in the zip folder and (2) a collection of the following NPY files: orig_image.npy (original input image unedited), image.npy (original input image after color balancing and normalization), classes.npy (list of classes annotated and present in the labelled image), doodles.npy (integer image of all image annotations), color_doodles.npy (color image of doodles.npy), label.npy (labelled image created from the classes present in the annotations), and settings.npy (annotation and machine learning settings used to generate the labelled image from annotations). All NPZ files can be extracted using the utilities available in Doodler (Buscombe, 2022). A merged CSV file containing detail information on the complete imagery collection is available at the top level of this data release, details of which are available in the Entity and Attribute section of this metadata file.
Facebook
TwitterUSGS Open-File Report 99-362 are digital files used to create the published paper map, USGS OFR 99-362. The 1:63,360 scale map shows the bedrock geology of a special study area within the Chugach National Forest, Alaska. Digital files include ARC/Info coverages in export format of geology, structural data, and annotation, and a PDF file of the Open-File Report.
Facebook
TwitterThis data set maps and describes the geology of the Bachelor Mountain 7.5' quadrangle, Riverside County, California. Created using Environmental Systems Research Institute's ARC/INFO software, the data base consists of the following items: (1) a map coverage containing geologic contacts and units, (2) a coverage containing structural data, (3) a coverage containing geologic unit annotation and leaders, and (4) attribute tables for geologic units (polygons), contacts (arcs), and site-specific data (points). In addition, the data set includes the following graphic and text products: (1) a postscript graphic plot-file containing the geologic map, topography, cultural data, a Correlation of Map Units (CMU) diagram, a Description of Map Units (DMU), and a key for point and line symbols, and (2) PDF files of the Readme (including the metadata file as an appendix), and the graphic produced by the Postscript plot file. The Bachelor Mountain quadrangle is located in the southern Perris block area of the Peninsular Ranges Province. Internally, the Perris block is a relatively stable area located between the Elsinore and San Jacinto Fault zones. In contrast to the rest of the quadrangle, the southern half is underlain almost entirely by young sedimentary units, chiefly the Pauba Formation of Pleistocene age. The Pauba Formation largely consists of well-indurated sandstone containing sparse cobble-to boulder conglomerate beds. It is eroded into a gentle badlands topography in most of its extent. Remnants of scattered, discontinuous alluvial deposits suggest the Pauba Formation was covered by relatively thin younger Pleistocene sediments. The most extensive remnant of these younger deposits forms a surface of low relief at Buck Mesa, just north of Long Valley. The northern half of the quadrangle is underlain by Mesozoic metasedimentary rocks that are intruded by plutonic rocks of the Cretaceous Peninsular Ranges batholith. The western part of these metamorphic rocks are mainly phyllite, grading eastward into quartzitic and schistose rocks. Metamorphic grade increases eastward also, to biotite, cordierite-biotite, and sillimanite schist. The oldest batholithic rocks in the quadrangle are massive hornblende gabbro including the large body underlying Bachelor Mountain. Large masses of gabbro are included in granodiorite and tonalite plutons east of Bachelor Mountain. In the northwestern part of the quadrangle is the southeastern part of the Paloma Valley Ring complex. This complex makes up much of the northern part of the Murrieta quadrangle and the southern part of the Romoland quadrangle. In the Bachelor Mountain quadrangle, rocks of the complex are limited to foliated tonalite which is the most mafic part of the complex. East of Skinner Reservoir (Lake Skinner) underlying the Tucalota Hills, is a series of north-trending massive-textured granodiorite plutons informally termed the granodiorite of Tucalota Hills (Morton, 1999). The geologic map data base contains original U.S. Geological Survey data generated by detailed field observation recorded on 1:24,000 scale aerial photographs. The map was created by transferring lines from the aerial photographs to a 1:24,000 scale topographic base. The map was digitized and lines, points, and polygons were subsequently edited using standard ARC/INFO commands. Digitizing and editing artifacts significant enough to display at a scale of 1:24,000 were corrected. Within the database, geologic contacts are represented as lines (arcs), geologic units are polygons, and site-specific data as points. Polygon, arc, and point attribute tables (.pat, .aat, and .pat, respectively) uniquely identify each geologic datum.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Imaging N-glycan spatial distribution in tissues using mass spectrometry imaging (MSI) is emerging as a promising tool in biological and clinical applications. However, there is currently no high-throughput tool for visualization and molecular annotation of N-glycans in MSI data, which significantly slows down data processing and hampers the applicability of this approach. Here, we present how METASPACE, an open-source cloud engine for molecular annotation of MSI data, can be used to automatically annotate, visualize, analyze, and interpret high-resolution mass spectrometry-based spatial N-glycomics data. METASPACE is an emerging tool in spatial metabolomics, but the lack of compatible glycan databases has limited its application for comprehensive N-glycan annotations from MSI data sets. We created NGlycDB, a public database of N-glycans, by adapting available glycan databases. We demonstrate the applicability of NGlycDB in METASPACE by analyzing MALDI-MSI data from formalin-fixed paraffin-embedded (FFPE) human kidney and mouse lung tissue sections. We added NGlycDB to METASPACE for public use, thus, facilitating applications of MSI in glycobiology.
Facebook
TwitterDatabase of a set of standard 3D virtual models at different stages of development from Carnegie Stages (CS) 12-23 (approximately 26-56 days post conception) in which various anatomical regions have been defined with a set of anatomical terms at various stages of development (known as an ontology). Experimental data is captured and converted to digital format and then mapped to the appropriate 3D model. The ontology is used to define sites of gene expression using a set of standard descriptions and to link the expression data to an ''''anatomical tree''''. Human data from stages CS12 to CS23 can be submitted to the HUDSEN Gene Expression Database. The anatomy ontology currently being used is based on the Edinburgh Human Developmental Anatomy Database which encompasses all developing structures from CS1 to CS20 but is not detailed for developing brain structures. The ontology is being extended and refined (by Prof Luis Puelles, University of Murcia, Spain) and will be incorporated into the HUDSEN database as it is developed. Expression data is annotated using two methods to denote sites of expression in the embryo: spatial annotation and text annotation. Additionally, many aspects of the detection reagent and specimen are also annotated during this process (assignment of IDs, nucleotide sequences for probes etc). There are currently two main ways to search HUDSEN - using a gene/protein name or a named anatomical structure as the query term. The entire contents of the database can be browsed using the data browser. Results may be saved. The data in HUDSEN is generated from both from researchers within the HUDSEN project, and from the wider scientific community. The HUDSEN human gene expression spatial database is a collaboration between the Institute of Human Genetics in Newcastle, UK, and the MRC Human Genetics Unit in Edinburgh, UK, and was developed as part of the Electronic Atlas of the Developing Human Brain (EADHB) project (funded by the NIH Human Brain Project). The database is based on the Edinburgh Mouse Atlas gene expression database (EMAGE), and is designed to be an openly available resource to the research community holding gene expression patterns during early human development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lot numbers are converted from an annotation feature class which are related to recorded subdivisions. This data is being hosted on gis.yavapaiaz.gov as a feature web service is in WGS 1984 Web Mercator Auxiliary Sphere (WKID 3857). This data was intended for general location mapping purposes in ArcGIS Online and County applications and is known to be entered but not updated as spatial adjustments are made to the tax parcels. This is being shared for Open Data.Access is granted to public agencies, educational institutions, non-profit organizations and private individuals for non-commercial purpose. For commercial use of the data see Arizona Revised Statutes 39-121.03. But per AGIC, Arizona Revised Statutes 27-178 section B in the Geospatial data sharing, "A public agency that shares geospatial data may exempt the data from commercial use fees prescribed in section 39-121.03, subsection A, paragraph 3." Regarding Arizona Revised Statutes 39-121.03 and the OpenData metadata sentence stating, geospatial data may exempt the data from commercial use fees prescribed in section 39-121.03, subsection A, paragraph 3.
Facebook
TwitterThis data set maps and describes the geology of the Perris 7.5' quadrangle, Riverside County, California. Created using Environmental Systems Research Institute's ARC/INFO software, the data base consists of the following items: (1) a map coverage containing geologic contacts and units, (2) a coverage containing structural data, (3) a coverage containing geologic unit annotation and leaders, and (4) attribute tables for geologic units (polygons), contacts (arcs), and site-specific data (points). In addition, the data set includes the following graphic and text products: (1) a postscript graphic plot-file containing the geologic map, topography, cultural data, a Correlation of Map Units (CMU) diagram, a Description of Map Units (DMU), and a key for point and line symbols, and (2) PDF files of the Readme (including the metadata file as an appendix), and the graphic produced by the Postscript plot file. The Perris quadrangle is located in the northern part of the Peninsular Ranges Province within the central part of the Perris block, a relatively stable, rectangular in plan area located between the Elsinore and San Jacinto fault zones. The quadrangle is underlain by Cretaceous age and older basement rocks. The Cretaceous plutonic rocks are part of the composite Peninsular Ranges batholith. A wide variety of intermediate composition granitic rocks are located in the quadrangle. These rocks are mainly of tonalitic composition but range from monzogranite to diorite. Most rock is faintly to intensely foliated. Many are heterogenous and contain varying amounts of meso-and melanocratic discoidal-shaped inclusions. Some rocks are composed essentially of inclusion material and some are migmatitic. Included within these granitic rocks are a few septa of Paleozoic(?) schist of upper amphibolite metamorphic grade. Metamorphic rocks of probable Mesozoic age occur in the southwest corner of the quadrangle. Most of these rocks are well-foliated phyllite of Mesozoic age. The metamorphic grade of these rocks is greenschist or sub-greenschist. Rocks of probable Paleozoic age occur as scattered masses within plutonic rocks in the northern part of the quadrangle. These rocks are of amphibolite grade and include cordierite and sillimanite biotite schist. In the center and southeast quarter of the quadrangle, biotite-hornblende tonalite of the Lakeview Mountains pluton is characterized by ubiquitous schlieren and by a lack of potassium feldspar. Masses of leucocratic and melanocratic rock occur scattered throughout the pluton. Mesocratic-to melanocratic discoidal-shaped inclusions are oriented parallel to the schlieren. A small body of comb-layered gabbro is located with the tonalite near the southern margin of the pluton. The tonalite contains rare-earth bearing, zoned pegmatite dikes. Biotite-hornblende tonalite located in the southwest part of the quadrangle is part of the Val Verde pluton. This tonalite is similar to that of the Lakeview Mountains pluton but lacks the ubiquitous schlieren and contains potassium feldspar. Diagonally crossing the quadrangle is the channel and flood plain of the ephemeral San Jacinto River. Most of the alluviated area west of the San Jacinto River consists of Pleistocene age fluvial deposits, which have a degraded upper surface that is preserved in some places near the contact with granitic rocks. The upper part of these deposits form the Paloma surface of Woodford and others (1971). A modern-to Holocene-age drainage channel is within these older Pleistocene deposits. Younger Pleistocene alluvial fans emanate from the Lakeview Mountains east of the San Jacinto River.
Facebook
TwitterThis data set maps and describes the geology of the Lakeview 7.5' quadrangle, Riverside County, California. The quadrangle encompasses part of the northern Peninsular Ranges Province. Tonalitic granitic rocks of the Cretaceous Peninsular Range batholith dominate the bedrock areas, and include rocks ranging in composition from monzogranite to gabbro. The Lakeview Mountains are underlain chiefly by tonalite of the Lakeview pluton and related rocks. In the northeastern corner of the quadrangle, Tertiary sedimentary rocks of the San Timoteo beds of Frick (1921) and Mount Eden Formation of Fraser (1931) rest on Paleozoic schist, quartzite, gneiss, and marble having a well developed east dipping foliation. The Tertiary formations are much more extensively exposed in the San Timoteo Badlands to the northeast and southeast. These Tertiary and Paleozoic units are separated from the Lakeview Mountains by the San Jacinto Valley, which locally contains up to 3,000 m of Quaternary sediments. Two strands of the seismically active San Jacinto Fault zone bound the Valley, the Claremont Fault on the northeast side, and the Casa Loma Fault on the southwest side. Numerous cracks and fissures related to both groundwater withdrawal and tectonic movements are developed in the Quaternary sediments, especially in the northern part of the quadrangle. Created using Environmental Systems Research Institute's ARC/INFO software, the database consists of the following items: (1) a map coverage containing faults, geologic contacts and units, (2) a coverage showing structural data, (3) a coverage containing geologic unit annotation and leaders, and (4) five additional INFO data tables (.rel) that contain detailed, coded, geologic information such as texture, fabric, color, and mineralogy and (5) line and point dictionaries, lines.rel and points.rel. These additional data are accessible to the user through the utilization of ARC/INFO relate environments and provide the user access to as much or as little of the encoded data as required. In addition, the data set includes the following graphic and text products: (1) A PostScript graphic plot-file containing the geologic map, topography, cultural data, a Correlation of Map Units (CMU) diagram, a Description of Map Units (DMU), and a key for point and line symbols, and (2) PDF files of this Readme (including the metadata file as an appendix), the poly_attrib_code.txt (the polygon attribute coding), and the graphic produced by the Postscript plot file. The geologic map database contains original U.S. Geological Survey data generated by detailed field observation and by interpretation of aerial photographs. Within the database, geologic contacts are represented as lines (arcs), geologic units as polygons, and site-specific data as points. Polygon, arc, and point attribute tables (.pat, .aat, and .pat, respectively) uniquely identify each geologic datum.
Facebook
TwitterWe combined transcriptomics and proteomics to perform genome-free de novo gene annotation in Bombyx mori.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e. building footprint & road network detection).
I have been experimenting on SAR image segmentation for the past few months and would like to share with the Kaggle community this high quality dataset. It is the data from SpaceNet 6 challenge and is freely available in AWS opendata. This dataset only contains the training split, if you are interested in the testing split (only SAR) or the expanded SAR and optical dataset you should follow the steps and download from AWS S3. I share the dataset here to cut the steps of downloading the data and utilize Kaggle's powerful cloud computing.
This openly-licensed dataset features a unique combination of half-meter Synthetic Aperture Radar (SAR) imagery from Capella Space and half-meter electro-optical (EO) imagery from Maxar.
https://miro.medium.com/max/267/1*rqZ_qb_gN2voJC7YEqOFuQ.png" alt="sar image1">
https://miro.medium.com/max/267/1*lM3Oj6wqfjhqI2o4SpngOQ.png" alt="rgb1">
https://miro.medium.com/max/334/1*lVzH0w8_GVIyZHSFUczbHw.png" alt="sar image2">
https://miro.medium.com/max/334/1*OYmAog0U9OGrScHFoHYqAQ.png" alt="rgb2">
SAR data are provided by Capella Space via an aerial mounted sensor collecting 204 individual image strips from both north and south facing look-angles. Each of the image strips features four polarizations (HH, HV, VH, and VV) of data and are preprocessed to display the intensity of backscatter in decibel units at half-meter spatial resolution
The 48k building footprint annotations are provided by 3D Basisregistratie Adressen en Gebouwen (3DBAG) dataset with some additional quality control. Also in the annotations are statistics of building heights derived from digital elevation model
https://miro.medium.com/max/500/1*x5VCNbYLjUmxjiLiT9jrYA.png" alt="building footprints">
Shermeyer, J., Hogan, D., Brown, J., Etten, A.V., Weir, N., Pacifici, F., Hänsch, R., Bastidas, A., Soenen, S., Bacastow, T.M., & Lewis, R. (2020). SpaceNet 6: Multi-Sensor All Weather Mapping Dataset. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 768-777. Arxiv paper
SAR imagery can be an answer to disaster analysis or frequent earth monitoring thanks to its active sensor, imaging day/night and in any cloud coverage. But SAR images have their own challenges, which requires a trained eye, unlike optical images. Moreover, the launch of new high resolution SAR satellites will yield massive quantity of earth observation data. Just like with any modern computer vision problem, this looks like a job for a deep learning model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data required to reproduce all analyses presented for the manuscript:
MuSpAn: A Toolbox for Multiscale Spatial Analysis
The data is organised into two main folders:
domains_for_figs_2_to_6 (MuSpAn domains)
Four domains of increasing size from regions within a healthy mouse colon (10x Genomics Colon Atlas panel).
Four samples of AKPT mouse tumors (10x Genomics 480 custom panel).
misc_checkpoint_data (Metadata - analysis checkpointing)
Colormap dictionaries for consistent visualization with the published figures.
Checkpointing files to support analyses requiring extended computation times.
Annotation data used for MuSpAn labeling.
The MuSpAn domains were created and saved using v1.2.0 of MuSpAn. This data is to be used with the associate python notebooks which can be found at:
https://github.com/joshwillmoore1/Supporting_material_muspan_paper
These notebooks both reproduce the analysis conducted in the study and serve as example material for MuSpAn usage, fully explained and linked to relevent documentation.
Facebook
TwitterThis data set maps and describes the geology of the Sunnymead 7.5' quadrangle, Riverside County, California. Created using Environmental Systems Research Institute's ARC/INFO software, the data base consists of the following items: (1) a map coverage containing geologic contacts and units, (2) a coverage containing structural data, (3) a coverage containing geologic unit annotation and leaders, and (4) attribute tables for geologic units (polygons), contacts (arcs), and site-specific data (points). In addition, the data set includes the following graphic and text products: (1) a postscript graphic plot-file containing the geologic map, topography, cultural data, a Correlation of Map Units (CMU) diagram, a Description of Map Units (DMU), and a key for point and line symbols, and (2) PDF files of the Readme (including the metadata file as an appendix), and the graphic produced by the Postscript plot file. The Sunnymead quadrangle is located in the northern part of the Peninsular Ranges Province and is underlain by Cretaceous and older basement rocks. This part of the Peninsular Ranges Province is divided into the Perris block, located west of the San Jacinto fault and the San Jacinto Mountains block to the east. The northwest quarter of the quadrangle is crossed diagonally by the San Jacinto fault zone, an important active major fault of the San Andreas fault system. The San Jacinto fault zone consist of a main trace and multiple discontinuous breaks. The main trace forms a dissected, west-facing fault scarp about 1,000 feet above the valley floor. A vaguely located fault in granitic rocks parallel to and west of the San Jacinto fault zone does not appear to cut Pleistocene age alluvial deposits. On the northern side of the San Jacinto fault zone is a thick section of Pliocene and Pleistocene continental sedimentary rocks, the upper part of the San Timoteo beds of Frick(1921). The area underlain by these rocks is termed the San Timoteo Badlands. Most of these beds consist of coarse-grained sandstone, conglomeratic sandstone, and conglomerate. All the clasts within these beds were derived from Transverse Ranges basement rocks that are located to the north of the quadrangle. The San Timoteo beds have been deformed into a broad anticlinal structure produced by the sedimentary beds being compressed as they are translated around a restraining bend in the San Jacinto fault north of the El Casco quadrangle. A curving, diachronous fault produced by this compression is located in the western part of the badlands just east of the San Jacinto fault zone. The area west of the San Jacinto fault zone is underlain by plutonic rocks of the Cretaceous-age Peninsular Ranges batholith with a few small included pendants of schist and gneiss of probable Paleozoic age. Most of the plutonic rocks are of tonalite composition and are mainly biotite-hornblende tonalite. In the northwestern part of the quadrangle is the eastern part of the Box Springs granitic complex, a basinal-shaped complex that appears to be the distal part of a diapiric-shaped complex. Most of the alluviated area west of the San Jacinto fault zone consists of Pleistocene age fluvial deposits. Most of these deposits have a degraded upper surface. The upper surface of these deposits are preserved in some places near the contact with granitic rocks. The upper part of these deposits form the Paloma surface of Woodford and others(1971). Holocene age alluvial fans emanate from the San Timoteo Badlands. The geologic map data base contains original U.S. Geological Survey data generated by detailed field observation recorded on 1:24,000 scale aerial photographs. The map was created by transferring lines from the aerial photographs to a 1:24,000 scale topographic base. The map was digitized and lines, points, and polygons were subsequently edited using standard ARC/INFO commands. Digitizing and editing artifacts significant enough to display at a scale of 1:24,000 were corrected. Within the database, geologic contacts are represented as lines (arcs), geologic units are polygons, and site-specific data as points. Polygon, arc, and point attribute tables (.pat, .aat, and .pat, respectively) uniquely identify each geologic datum.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains annotations (i.e. polygons) for solar photovoltaic (PV) objects in the previously published dataset "Classification Training Dataset for Crop Types in Rwanda" published by RTI International (DOI: 10.34911/rdnt.r4p1fr [1]). These polygons are intended to enable the use of this dataset as a machine learning training dataset for solar PV identification in drone imagery. Note that this dataset contains ONLY the solar panel polygon labels and needs to be used with the original RGB UAV imagery “Drone Imagery Classification Training Dataset for Crop Types in Rwanda” (https://mlhub.earth/data/rti_rwanda_crop_type). The original dataset contains UAV imagery (RGB) in .tiff format in six provinces in Rwanda, each with three phases imaged and our solar PV annotation dataset follows the same data structure with province and phase label in each subfolder.Data processing:Please refer to this Github repository for further details: https://github.com/BensonRen/Drone_based_solar_PV_detection. The original dataset is divided into 8000x8000 pixel image tiles and manually labeled with polygons (mainly rectangles) to indicate the presence of solar PV. These polygons are converted into pixel-wise, binary class annotations.Other information:1. The six provinces that UAV imagery came from are: (1) Cyampirita (2) Kabarama (3) Kaberege (4) Kinyaga (5) Ngarama (6) Rwakigarati. These original data collections were staged across 18 phases, each collected a set of imagery from a given Province (each provinces had 3 phases of collection). We have annotated 15 out of 18 phases, with the missing ones being: Kabarama-Phase2, Kaberege-Phase3, and Kinyaga-Phase3 due to data compatibility issues of the unused phases.2. The annotated polygons are transformed into binary maps the size of the image tiles but where each pixel is either 0 or 1. In this case, 0 represents background and 1 represents solar PV pixels. These binary maps are in .png format and each Province/phase set has between 9 and 49 annotation patches. Using the code provided in the above repository, the same image patches can be cropped from the original RGB imagery.3. Solar PV densities vary across the image patches. In total, there were 214 solar PV instances labeled in the 15 phase.Associated publications:“Utilizing geospatial data for assessing energy security: Mapping small solar home systems using unmanned aerial vehicles and deep learning” [https://arxiv.org/abs/2201.05548]This dataset is published under CC-BY-NC-SA-4.0 license. (https://creativecommons.org/licenses/by-nc-sa/4.0/)
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Today, deep neural networks are widely used in many computer vision problems, also for geographic information systems (GIS) data. This type of data is commonly used for urban analyzes and spatial planning. We used orthophotographic images of two residential districts from Kielce, Poland for research including urban sprawl automatic analysis with Transformer-based neural network application.Orthophotomaps were obtained from Kielce GIS portal. Then, the map was manually masked into building and building surroundings classes. Finally, the ortophotomap and corresponding classification mask were simultaneously divided into small tiles. This approach is common in image data preprocessing for machine learning algorithms learning phase. Data contains two original orthophotomaps from Wietrznia and Pod Telegrafem residential districts with corresponding masks and also their tiled version, ready to provide as a training data for machine learning models.Transformed-based neural network has undergone a training process on the Wietrznia dataset, targeted for semantic segmentation of the tiles into buildings and surroundings classes. After that, inference of the models was used to test model's generalization ability on the Pod Telegrafem dataset. The efficiency of the model was satisfying, so it can be used in automatic semantic building segmentation. Then, the process of dividing the images can be reversed and complete classification mask retrieved. This mask can be used for area of the buildings calculations and urban sprawl monitoring, if the research would be repeated for GIS data from wider time horizon.Since the dataset was collected from Kielce GIS portal, as the part of the Polish Main Office of Geodesy and Cartography data resource, it may be used only for non-profit and non-commertial purposes, in private or scientific applications, under the law "Ustawa z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2006 r. nr 90 poz 631 z późn. zm.)". There are no other legal or ethical considerations in reuse potential.Data information is presented below.wietrznia_2019.jpg - orthophotomap of Wietrznia districtmodel's - used for training, as an explanatory imagewietrznia_2019.png - classification mask of Wietrznia district - used for model's training, as a target imagewietrznia_2019_validation.jpg - one image from Wietrznia district - used for model's validation during training phasepod_telegrafem_2019.jpg - orthophotomap of Pod Telegrafem district - used for model's evaluation after training phasewietrznia_2019 - folder with wietrznia_2019.jpg (image) and wietrznia_2019.png (annotation) images, divided into 810 tiles (512 x 512 pixels each), tiles with no information were manually removed, so the training data would contain only informative tilestiles presented - used for the model during training (images and annotations for fitting the model to the data)wietrznia_2019_vaidation - folder with wietrznia_2019_validation.jpg image divided into 16 tiles (256 x 256 pixels each) - tiles were presented to the model during training (images for validation model's efficiency); it was not the part of the training datapod_telegrafem_2019 - folder with pod_telegrafem.jpg image divided into 196 tiles (256 x 265 pixels each) - tiles were presented to the model during inference (images for evaluation model's robustness)Dataset was created as described below.Firstly, the orthophotomaps were collected from Kielce Geoportal (https://gis.kielce.eu). Kielce Geoportal offers a .pst recent map from April 2019. It is an orthophotomap with a resolution of 5 x 5 pixels, constructed from a plane flight at 700 meters over ground height, taken with a camera for vertical photos. Downloading was done by WMS in open-source QGIS software (https://www.qgis.org), as a 1:500 scale map, then converted to a 1200 dpi PNG image.Secondly, the map from Wietrznia residential district was manually labelled, also in QGIS, in the same scope, as the orthophotomap. Annotation based on land cover map information was also obtained from Kielce Geoportal. There are two classes - residential building and surrounding. Second map, from Pod Telegrafem district was not annotated, since it was used in the testing phase and imitates situation, where there is no annotation for the new data presented to the model.Next, the images was converted to an RGB JPG images, and the annotation map was converted to 8-bit GRAY PNG image.Finally, Wietrznia data files were tiled to 512 x 512 pixels tiles, in Python PIL library. Tiles with no information or a relatively small amount of information (only white background or mostly white background) were manually removed. So, from the 29113 x 15938 pixels orthophotomap, only 810 tiles with corresponding annotations were left, ready to train the machine learning model for the semantic segmentation task. Pod Telegrafem orthophotomap was tiled with no manual removing, so from the 7168 x 7168 pixels ortophotomap were created 197 tiles with 256 x 256 pixels resolution. There was also image of one residential building, used for model's validation during training phase, it was not the part of the training data, but was a part of Wietrznia residential area. It was 2048 x 2048 pixel ortophotomap, tiled to 16 tiles 256 x 265 pixels each.