Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.
Key observations
Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
File name definitions:
'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s
'...v_175_250...' - dataset for velocity range [175, 250] m/s
'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected
'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart
Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?
input values in 'IN' sheet
target values in 'TARGET' sheet
Where to find the results from the best ANN model (for each target/output variable and each velocity range)?
open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet
Check reference below (to be added when the paper is published)
https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a simple dataset for getting started with Machine Learning for point cloud data. It take the original MNIST and converts each of the non-zero pixels into points in a 2D space. The idea is to classify each collection of point (rather than images) to the same label as in the MNIST. The source for generating this dataset can be found in this repository: cgarciae/point-cloud-mnist-2D
There are 2 files: train.csv and test.csv. Each file has the columns
label,x0,y0,v0,x1,y1,v1,...,x350,y350,v350
where
label contains the target label in the range [0, 9]x{i} contain the x position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].y{i} contain the y position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].v{i} contain the value of the pixel in the range [-1, 255].The maximum number of point found on a image was 351, images with less points where padded to this length using the following values:
x{i} = -1y{i} = -1v{i} = -1To make the challenge more interesting you can also try to solve the problem using a subset of points, e.g. the first N. Here are some visualizations of the dataset using different amounts of points:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2Fbbf5393884480e3d24772344e079c898%2F50.png?generation=1579911143877077&alt=media" alt="50">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5a83f6f5f7c5791e3c1c8e9eba2d052b%2F100.png?generation=1579911238988368&alt=media" alt="100">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F202098ed0da35c41ae45dfc32e865972%2F200.png?generation=1579911264286372&alt=media" alt="200">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5c733566f8d689c5e0fd300440d04da2%2Fmax.png?generation=1579911289750248&alt=media" alt="">
This histogram of the distribution the number of points per image in the dataset can give you a general idea of how difficult each variation can be.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F9eb3b463f77a887dae83a7af0eb08c7d%2Flengths.png?generation=1579911380397412&alt=media" alt="">
Facebook
TwitterThis dataset delineates the spatial range of wild pheasant populations in Minnesota as of 2002 by dividing the MN state boundary into 2 units: pheasant range and non-range.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Species distribution data provide the foundation for a wide range of ecological research studies and conservation management decisions. Two major efforts to provide marine species distributions at a global scale are the International Union for Conservation of Nature (IUCN), which provides expert-generated range maps that outline the complete extent of a species' distribution; and AquaMaps, which provides model-generated species distribution maps that predict areas occupied by the species. Together these databases represent 24,586 species (93.1% within AquaMaps, 16.4% within IUCN), with only 2,330 shared species. Differences in intent and methodology can result in very different predictions of species distributions, which bear important implications for scientists and decision makers who rely upon these datasets when conducting research or informing conservation policy and management actions. Comparing distributions for the small subset of species with maps in both datasets, we found that AquaMaps and IUCN range maps show strong agreement for many well-studied species, but our analysis highlights several key examples in which introduced errors drive differences in predicted species ranges. In particular, we find that IUCN maps greatly overpredict coral presence into unsuitably deep waters, and we show that some AquaMaps computer-generated default maps (only 5.7% of which have been reviewed by experts) can produce odd discontinuities at the extremes of a species’ predicted range. We illustrate the scientific and management implications of these tradeoffs by repeating a global analysis of gaps in coverage of marine protected areas, and find significantly different results depending on how the two datasets are used. By highlighting tradeoffs between the two datasets, we hope to encourage increased collaboration between taxa experts and large scale species distribution modeling efforts to further improve these foundational datasets, helping to better inform science and policy recommendations around understanding, managing, and protecting marine biodiversity.
Facebook
Twitterhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
This dataset contains ERA5 surface level analysis parameter data. ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.
Model level analysis and surface forecast data to complement this dataset are also available. Data from a 10 member ensemble, run at lower spatial and temporal resolution, were also produced to provide an uncertainty estimate for the output from the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation producing data in this dataset.
The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects.
An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
PPFT: Path Planning on Fifty x Thirty Our dataset consists of 100 manually selected 50 × 30 maps from a randomly generated collection, each with 10 different start and goal positions. Therefore, there are 1000 samples in total. Our data conform to the standard of search-based algorithm environments in a continuous space. Each map includes the following parameters:
x range: The minimum and maximum x-coordinates of the environment boundary range as [x min, x max]. y range: The minimum and… See the full description on the dataset page: https://huggingface.co/datasets/SilinMeng0510/PPFT.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Grass Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Grass Range. The dataset can be utilized to understand the population distribution of Grass Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Grass Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Grass Range.
Key observations
Largest age group (population): Male # 35-39 years (7) | Female # 70-74 years (36). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Grass Range Population by Gender. You can refer the same here
Facebook
Twitterhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
This dataset contains ERA5 surface level analysis parameter data ensemble means (see linked dataset for spreads). ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble means and spreads are calculated from the ERA5 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.
Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data.
The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects.
An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
Facebook
Twitterhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
This dataset contains ensemble spreads for the ERA5 initial release (ERA5t) surface level analysis parameter data ensemble means (see linked dataset). ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. The ensemble means and spreads are calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.
Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data.
The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed and, if required, amended before the full ERA5 release. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record.
Facebook
TwitterThe Four-Dimensional (4DWX) System is the product of seven years of R&D, sponsored by the US Army Test and Evaluation Command (ATEC) and by the Defense Threat Reduction Agency (DTRA). This dataset contains data from the Dugway Proving Grounds (DPG) in Utah. The DPG site 1 reports every 5 minutes; the other stations all report for 15 minutes. For more information, see www.4dwx.org.
Facebook
TwitterThis archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Fire. The data include parameters of fire history|tree ring with a geographic location of Utah, United States Of America. The time period coverage is from 271 to -55 in calendar years before present (BP). See metadata information for parameter and study location details. Please cite this study when using the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Last update: February 2021.The dataset folder includes both raw and post-processed radar data used for training and testing the networks proposed in Sect. VIII of the article “Machine Learning and Deep Learning Techniques for Colocated MIMO Radars: A Tutorial Overview”.The folder Human Activity Classification contains“Raw” folder where 150 files acquired with our FMCW radar sensor are given inside the “doppler_dataset” zip folder; they are divided in 50 for walking, 50 for jumping and 50 for running;“Post_process” divided in- “Machine Learning” folder including “dataset_ML_doppler_real_activities.mat”; this dataset has been used for training and testing the SVM, K-NN and Adaboost described in Sect. VIII-A).- The 150x4 matrix “X_meas” including the features described by eqs. (227)-(234) and the 150x1 vector of char “labels_py” containing the associated labels.- “Deep Learning” folder containing “dataset_DL_doppler_real_activities.mat”; this dataset is composed by 150 structs of data, where each of them, associated to a specific activity, includes:- The label associated to the considered activity,- The overall range variations from the beginning to the end of the motion “delta_R”;- The Range-Doppler map “RD_map”;- The normalized spectrogram “SP_Norm”;- The Cadence Velocity Diagram “CVD”;- The period of the spectrogram “per”;- The peaks associated to the greatest three cadence frequencies in “peaks_cad”;- The three strongest cadence frequencies and their normalized version in “cad_freqs” and “cad_freqs_norm”;- The strongest cadence frequency “c1”;- The three velocity profiles associated to the three strongest cadence frequencies “matr_vex”.The spectrogram images (SP_Norm) contained in this dataset were used for training and testing the CNN in Sect. VIII-A).The folder Obstacle Detection contains“Raw” folder where raw data acquired with our radar system and TOF camera in the presence of a multi target or single target scenario are given inside the “obst_detect_Raw_mat” zip folder. It’s important to note that each radar frame and each TOF camera image have their own time stamp, but since they come from different sensor have to be synchronized.“Post_process” divided in- “Neural Net” folder containing “inputs_bis_1.mat” and “t_d_1.mat”, where- “inputs_bis_1.mat” contains the vectors of features of size 32x1, used for training and testing the feed-forward neural network described in Sect. VIII-B) (see eqs. (243)-(251)),- “t_d_1.mat” contains the associated 2x1 vectors of labels (see eq. (235)).- “Yolo v2” folder containing the folder “Dataset_YOLO” and the table “obj_dataset_tab”, where- “Dataset_YOLO_v2” contains (inside the sub-folder “obj_dataset”) the Range-Azimuth maps used for training the YOLO v2 network (see eqs. (257)-(258) and Fig. 30);- “obj_dataset_tab” contains the path, the bounding box and the label associated to the Range-Azimuth maps (see eq. (256)-(266)).Cite as: A. Davoli, G. Guerzoni and G. M. Vitetta, "Machine Learning and Deep Learning Techniques for Colocated MIMO Radars: A Tutorial Overview," in IEEE Access, vol. 9, pp. 33704-33755, 2021, doi: 10.1109/ACCESS.2021.3061424.
Facebook
TwitterThe Geographic Names Information System (GNIS) actively seeks data from and partnerships with Government agencies at all levels and other interested organizations. The GNIS is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board on Geographic Names, a Federal inter-agency body chartered by public law to maintain uniform feature name usage throughout the Government and to promulgate standard names to the public. The GNIS is the official repository of domestic geographic names data; the official vehicle for geographic names use by all departments of the Federal Government; and the source for applying geographic names to Federal electronic and printed products of all types. See http://geonames.usgs.gov for additional information.
Facebook
TwitterThis study presents a catalog of 8107 molecular clouds that covers the entire Galactic plane and includes 98% of the 12CO emission observed within b +/- 5 deg. The catalog was produced using a hierarchical cluster identification method applied to the result of a Gaussian decomposition of the Dame+ (2001ApJ...547..792D) data. The total H2 mass in the catalog is 1.2 x 109 Msun, in agreement with previous estimates. The authors find that 30% of the sight lines intersect only a single cloud, with another 25% intersecting only two clouds. The most probable cloud size is R~30pc. In contrast with the general idea, the authors find a rather large range of values of surface densities, Sigma = 2 to 300 Msun/pc2, and a systematic decrease with increasing Galactic radius, Rgal. The cloud velocity dispersion and the normalization sigma0 = sigmav / R1/2 both decrease systematically with Rgal. When studied over the whole Galactic disk, there is a large dispersion in the line width-size relation and a significantly better correlation between sigmav and SigmaR. The normalization of this correlation is constant to better than a factor of two for Rgal < 20kpc. This relation is used to disentangle the ambiguity between near and far kinematic distances. The authors report a strong variation of the turbulent energy injection rate. In the outer Galaxy it may be maintained by accretion through the disk and/or onto the clouds, but neither source can drive the 100 times higher cloud-averaged injection rate in the inner Galaxy. The data set used in this catalog come from that of Dame+ (2001ApJ...547..792D). Those authors combined observations obtained over a period of 20 yr with two telescopes, one in the north (first located in New York City and then moved to Cambridge, Massachusetts) and one in the south (Cerro Tololo, Chile). These 1.2m telescopes have an angular resolution of ~8.5' at 115GHz, the frequency of the 12CO 1-0 line. For the current study the authors used the data set covering the whole Galactic plane with +/- 5 deg in Galactic latitude. This table was created by the HEASARC in March 2019 based upon the CDS Catalog J/ApJ/834/57 file table1.dat. This is a service provided by NASA HEASARC .
Facebook
TwitterOur dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.
Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.
We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:
Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.
Each sample consists of a single 3d MCFO image of neurons of the fruit fly.
For each image, we provide a pixel-wise instance segmentation for all separable neurons.
Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").
The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.
The segmentation mask for each neuron is stored in a separate channel.
The order of dimensions is CZYX.
We recommend to work in a virtual environment, e.g., by using conda:
conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env
pip install zarr
import zarrraw = zarr.open(seg = zarr.open(
# optional:import numpy as npraw_np = np.array(raw)
Zarr arrays are read lazily on-demand.
Many functions that expect numpy arrays also work with zarr arrays.
Optionally, the arrays can also explicitly be converted to numpy arrays.
We recommend to use napari to view the image data.
pip install "napari[all]"
import zarr, sys, napari
raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")
viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()
python view_data.py
For more information on our selected metrics and formal definitions please see our paper.
To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..
For detailed information on the methods and the quantitative results please see our paper.
The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
If you use FISBe in your research, please use the following BibTeX entry:
@misc{mais2024fisbe,
title = {FISBe: A real-world benchmark dataset for instance
segmentation of long-range thin filamentous structures},
author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya
Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena
Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller},
year = 2024,
eprint = {2404.00130},
archivePrefix ={arXiv},
primaryClass = {cs.CV}
}
We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuable
discussions.
P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.
This work was co-funded by Helmholtz Imaging.
There have been no changes to the dataset so far.
All future change will be listed on the changelog page.
If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.
All contributions are welcome!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EEd: R2-score between original and estimated thickness values, mEEn: median of R2-scores between original and estimated n(λ) arrays; mEEk: median of R2-scores between original and estimated k(λ) arrays; mEER: mean of R2-scores between original and estimated R(λ) arrays; mEET: mean of R2-scores between original and estimated T(λ) arrays; SR: success rate = number of successful occasions (nss)/ total occasions (ns); and mFE: average number of function evaluations: ; and mFE denotes the average number of function evaluations (iterations × population size).
Facebook
Twitterhttps://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario
Shows areas where the health and prevalence of caribou can be linked to the attributes of the land that supports them.
Ontario's Woodland Caribou Recovery Strategy (2008) provides advice and recommendations on the approaches needed for the recovery of Woodland Caribou.
The strategy recommends the identification of ranges and local populations to:
Instructions for downloading this dataset:
This product requires the use of GIS software.
*[GIS]: geographic information system
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78