100+ datasets found

Point Cloud Mnist 2D
kaggle.com
zip
Updated Feb 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Garcia (2020). Point Cloud Mnist 2D [Dataset]. https://www.kaggle.com/datasets/cristiangarcia/pointcloudmnist2d/discussion
Explore at:
zip(34176926 bytes)Available download formats
Dataset updated
Feb 12, 2020
Authors
Cristian Garcia
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Point Cloud MNIST 2D

This is a simple dataset for getting started with Machine Learning for point cloud data. It take the original MNIST and converts each of the non-zero pixels into points in a 2D space. The idea is to classify each collection of point (rather than images) to the same label as in the MNIST. The source for generating this dataset can be found in this repository: cgarciae/point-cloud-mnist-2D

Format

There are 2 files: train.csv and test.csv. Each file has the columns

label,x0,y0,v0,x1,y1,v1,...,x350,y350,v350

where

label contains the target label in the range [0, 9]

x{i} contain the x position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].

y{i} contain the y position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].

v{i} contain the value of the pixel in the range [-1, 255].

Padding

The maximum number of point found on a image was 351, images with less points where padded to this length using the following values:

x{i} = -1

y{i} = -1

v{i} = -1

Subsamples

To make the challenge more interesting you can also try to solve the problem using a subset of points, e.g. the first N. Here are some visualizations of the dataset using different amounts of points:

50

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2Fbbf5393884480e3d24772344e079c898%2F50.png?generation=1579911143877077&alt=media" alt="50">

100

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5a83f6f5f7c5791e3c1c8e9eba2d052b%2F100.png?generation=1579911238988368&alt=media" alt="100">

200

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F202098ed0da35c41ae45dfc32e865972%2F200.png?generation=1579911264286372&alt=media" alt="200">

351

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5c733566f8d689c5e0fd300440d04da2%2Fmax.png?generation=1579911289750248&alt=media" alt="">

Distribution

This histogram of the distribution the number of points per image in the dataset can give you a general idea of how difficult each variation can be.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F9eb3b463f77a887dae83a7af0eb08c7d%2Flengths.png?generation=1579911380397412&alt=media" alt="">
Z
Data from: FISBe: A real-world benchmark dataset for instance segmentation...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
Explore at:
Dataset updated
Apr 2, 2024
Dataset provided by
German Cancer Research Center
Max Delbrück Center
Max Delbrück Center for Molecular Medicine
Howard Hughes Medical Institute - Janelia Research Campus
Authors
Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General

For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

Summary

A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

30 completely labeled (segmented) images

71 partly labeled images

altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

A set of metrics and a novel ranking score for respective meaningful method benchmarking

An evaluation of three baseline methods in terms of the above metrics and score

Abstract

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

Dataset documentation:

We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

FISBe Datasheet

Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

Files

fisbe_v1.0_{completely,partly}.zip

contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

fisbe_v1.0_mips.zip

maximum intensity projections of all samples, for convenience.

sample_list_per_split.txt

a simple list of all samples and the subset they are in, for convenience.

view_data.py

a simple python script to visualize samples, see below for more information on how to use it.

dim_neurons_val_and_test_sets.json

a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

Readme.md

general information

How to work with the image files

Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

We recommend to work in a virtual environment, e.g., by using conda:

conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

How to open zarr files

Install the python zarr package:

pip install zarr

Opened a zarr file with:

import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

optional:import numpy as npraw_np = np.array(raw)

Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

How to view zarr image files

We recommend to use napari to view the image data.

Install napari:

pip install "napari[all]"

Save the following Python script:

import zarr, sys, napari

raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

Execute:

python view_data.py /R9F03-20181030_62_B5.zarr

Metrics

S: Average of avF1 and C

avF1: Average F1 Score

C: Average ground truth coverage

clDice_TP: Average true positives clDice

FS: Number of false splits

FM: Number of false merges

tp: Relative number of true positives

For more information on our selected metrics and formal definitions please see our paper.

Baseline

To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

License

The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Citation

If you use FISBe in your research, please use the following BibTeX entry:

@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

Acknowledgments

We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

Changelog

There have been no changes to the dataset so far.All future change will be listed on the changelog page.

Contributing

If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

All contributions are welcome!
Data from: Jornada Experimental Range (USDA-ARS) monthly stocking data and...
catalog.data.gov
datasetcatalog.nlm.nih.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Jornada Experimental Range (USDA-ARS) monthly stocking data and pasture shape files from 1915 to 1952 [Dataset]. https://catalog.data.gov/dataset/jornada-experimental-range-usda-ars-monthly-stocking-data-and-pasture-shape-files-from-191
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
This data package contains two types of data for the Jornada Experimental Range (JER) from 1915 to 1952: 1) shape files containing polygons and attribute tables that represent the pasture configurations on the Jornada Experimental Range and 2) monthly stocking data from these pastures. The livestock represented in the stocking data comprise cattle, horse, sheep, and goats. Grazing goats were infrequent and are grouped with sheep in the source data. As such for this data set, they are included in the sheep category. Stocking data are expressed in animal unit months (AUM), which is based on metabolic weight.This data package provides finer resolution AUM data than knb-lter-jrn.210412001, which presents the annual stocking data for the entire JER from 1916 to 2001. The stocking data in this package begins in June of 1915 and continues through December of 1952, the last year for which the researchers on this project have verified and digitized historical pasture configurations on the JER.https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210412001
Path loss at 5G high frequency range in South Asia
kaggle.com
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S M MEHEDI ZAMAN (2023). Path loss at 5G high frequency range in South Asia [Dataset]. https://www.kaggle.com/datasets/smmehedizaman/path-loss-at-5g-high-frequency-range-in-south-asia
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
S M MEHEDI ZAMAN
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Asia, South Asia
Description
This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.

Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972

Citation

If you use this dataset, please cite the following paper:

Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972

BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }
d
TIGER/Line Shapefile, 2016, Series Information for the Address Range-Feature...
catalog.data.gov
s.cnmilf.com
+1more
Updated Dec 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). TIGER/Line Shapefile, 2016, Series Information for the Address Range-Feature County-based Shapefile [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2016-series-information-for-the-address-range-feature-county-based-shapefi
Explore at:
Dataset updated
Dec 2, 2020
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Feature Shapefile (ADDRFEAT.dbf) contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. The ADDRFEAT shapefile contains a record for each address range to street name combination. Address range associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that the ADDRFEAT shapefile includes all unsuppressed address ranges compared to the All Lines Shapefile (EDGES.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefile contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
Fused Image dataset for convolutional neural Network-based crack Detection...
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6383044
Dataset updated
Apr 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
d
Turkey Vulture Range - CWHR B108 [ds1441]
catalog.data.gov
data.ca.gov
+4more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). Turkey Vulture Range - CWHR B108 [ds1441] [Dataset]. https://catalog.data.gov/dataset/turkey-vulture-range-cwhr-b108-ds1441-07dc3
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Fish and Wildlife
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
d
Street Network Database SND
catalog.data.gov
data.seattle.gov
+2more
Updated Oct 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle ArcGIS Online (2025). Street Network Database SND [Dataset]. https://catalog.data.gov/dataset/street-network-database-snd-1712b
Explore at:
Dataset updated
Oct 4, 2025
Dataset provided by
City of Seattle ArcGIS Online
Description
The pathway representation consists of segments and intersection elements. A segment is a linear graphic element that represents a continuous physical travel path terminated by path end (dead end) or physical intersection with other travel paths. Segments have one street name, one address range and one set of segment characteristics. A segment may have none or multiple alias street names. Segment types included are Freeways, Highways, Streets, Alleys (named only), Railroads, Walkways, and Bike lanes. SNDSEG_PV is a linear feature class representing the SND Segment Feature, with attributes for Street name, Address Range, Alias Street name and segment Characteristics objects. Part of the Address Range and all of Street name objects are logically shared with the Discrete Address Point-Master Address File layer. Appropriate uses include: Cartography - Used to depict the City's transportation network location and connections, typically on smaller scaled maps or images where a single line representation is appropriate. Used to depict specific classifications of roadway use, also typically at smaller scales. Used to label transportation network feature names typically on larger scaled maps. Used to label address ranges with associated transportation network features typically on larger scaled maps. Geocode reference - Used as a source for derived reference data for address validation and theoretical address location Address Range data repository - This data store is the City's address range repository defining address ranges in association with transportation network features. Polygon boundary reference - Used to define various area boundaries is other feature classes where coincident with the transportation network. Does not contain polygon features. Address based extracts - Used to create flat-file extracts typically indexed by address with reference to business data typically associated with transportation network features. Thematic linear location reference - By providing unique, stable identifiers for each linear feature, thematic data is associated to specific transportation network features via these identifiers. Thematic intersection location reference - By providing unique, stable identifiers for each intersection feature, thematic data is associated to specific transportation network features via these identifiers. Network route tracing - Used as source for derived reference data used to determine point to point travel paths or determine optimal stop allocation along a travel path. Topological connections with segments - Used to provide a specific definition of location for each transportation network feature. Also provides a specific definition of connection between each transportation network feature. (defines where the streets are and the relationship between them ie. 4th Ave is west of 5th Ave and 4th Ave does intersect with Cherry St) Event location reference - Used as source for derived reference data used to locate event and linear referencing.Data source is TRANSPO.SNDSEG_PV. Updated weekly.
d
Data from: Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 -...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 - 7—Data [Dataset]. https://catalog.data.gov/dataset/variable-terrestrial-gps-telemetry-detection-rates-parts-1-7data
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
U.S. Geological Survey
Description
Studies utilizing Global Positioning System (GPS) telemetry rarely result in 100% fix success rates (FSR). Many assessments of wildlife resource use do not account for missing data, either assuming data loss is random or because a lack of practical treatment for systematic data loss. Several studies have explored how the environment, technological features, and animal behavior influence rates of missing data in GPS telemetry, but previous spatially explicit models developed to correct for sampling bias have been specified to small study areas, on a small range of data loss, or to be species-specific, limiting their general utility. Here we explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use. We also evaluate patterns in missing data that relate to potential animal activities that change the orientation of the antennae and characterize home-range probability of GPS detection for 4 focal species; cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Part 1, Positive Openness Raster (raster dataset): Openness is an angular measure of the relationship between surface relief and horizontal distance. For angles less than 90 degrees it is equivalent to the internal angle of a cone with its apex at a DEM location, and is constrained by neighboring elevations within a specified radial distance. 480 meter search radius was used for this calculation of positive openness. Openness incorporates the terrain line-of-sight or viewshed concept and is calculated from multiple zenith and nadir angles-here along eight azimuths. Positive openness measures openness above the surface, with high values for convex forms and low values for concave forms (Yokoyama et al. 2002). We calculated positive openness using a custom python script, following the methods of Yokoyama et. al (2002) using a USGS National Elevation Dataset as input. Part 2, Northern Arizona GPS Test Collar (csv): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. The model training data are provided here for fix attempts by hour. This table can be linked with the site location shapefile using the site field. Part 3, Probability Raster (raster dataset): Bias correction in GPS telemetry datasets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix aquistion. We found terrain exposure and tall overstory vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The models predictive ability was evaluated using two independent datasets from stationary test collars of different make/model, fix interval programing, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. We evaluated GPS telemetry datasets by comparing the mean probability of a successful GPS fix across study animals home-ranges, to the actual observed FSR of GPS downloaded deployed collars on cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Comparing the mean probability of acquisition within study animals home-ranges and observed FSRs of GPS downloaded collars resulted in a approximatly 1:1 linear relationship with an r-sq= 0.68. Part 4, GPS Test Collar Sites (shapefile): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. Part 5, Cougar Home Ranges (shapefile): Cougar home-ranges were calculated to compare the mean probability of a GPS fix acquisition across the home-range to the actual fix success rate (FSR) of the collar as a means for evaluating if characteristics of an animal’s home-range have an effect on observed FSR. We estimated home-ranges using the Local Convex Hull (LoCoH) method using the 90th isopleth. Data obtained from GPS download of retrieved units were only used. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose as additional 10% of data. Comparisons with home-range mean probability of fix were also used as a reference for assessing if the frequency animals use areas of low GPS acquisition rates may play a role in observed FSRs. Part 6, Cougar Fix Success Rate by Hour (csv): Cougar GPS collar fix success varied by hour-of-day suggesting circadian rhythms with bouts of rest during daylight hours may change the orientation of the GPS receiver affecting the ability to acquire fixes. Raw data of overall fix success rates (FSR) and FSR by hour were used to predict relative reductions in FSR. Data only includes direct GPS download datasets. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose approximately an additional 10% of data. Part 7, Openness Python Script version 2.0: This python script was used to calculate positive openness using a 30 meter digital elevation model for a large geographic area in Arizona, California, Nevada and Utah. A scientific research project used the script to explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use.
m
USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven
app.mobito.io
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-enriched-geospatial-framework-dataset
Explore at:
Area covered
United States
Description
Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).
Virginia Opossum Range - CWHR M001 [ds1799]
data-cdfw.opendata.arcgis.com
data.cnra.ca.gov
+5more
Updated Mar 4, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2020). Virginia Opossum Range - CWHR M001 [ds1799] [Dataset]. https://data-cdfw.opendata.arcgis.com/datasets/CDFW::virginia-opossum-range-cwhr-m001-ds1799
Explore at:
Dataset updated
Mar 4, 2020
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
B
Data from: A comprehensive analysis of autocorrelation and bias in home...
datasetcatalog.nlm.nih.gov
borealisdata.ca
+1more
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schabo, Dana G.; Ullmann, Wiebke; de Paula Cunha, Rogerio; Markham, A. Catherine; Alberts, Susan C.; Selva, Nuria; Koch, Flávia; Ali, Abdullahi H.; Zwijacz-Kozica, Tomasz; Thompson, Peter; Sergiel, Agnieszka; Mueller, Thomas; Dekker, Jasja; Ramalho, Emiliano E.; Patterson, Bruce D.; Morato, Ronaldo G.; Farwig, Nina; da Silva, Marina X.; LaPoint, Scott; Beyer, Dean; Medici, Emilia Patricia; Goheen, Jacob R.; Noonan, Michael J.; Olson, Kirk A.; Jeltsch, Florian; Belant, Jerrold L.; Fichtel, Claudia; Fleming, Christen H.; Akre, Tom S.; Ford, Adam T.; Nathan, Ran; Böhning-Gaese, Katrin; Fagan, William F.; Blaum, Niels; Tucker, Marlee A.; Antunes, Pamela C.; Drescher-Lehman, Jonathan; Rosner, Sascha; Calabrese, Justin M.; Paviolo, Agustin; Cullen Jr. , Laury; Fischer, Christina; Spiegel, Orr; Altmann, Jeanne; Zięba, Filip; Oliveira-Santos, Luiz Gustavo R.; Kappeler, Peter M.; Kauffman, Matthew; Janssen, René (2021). Data from: A comprehensive analysis of autocorrelation and bias in home range estimation [Dataset]. http://doi.org/10.5683/SP2/OAJTAO
Explore at:
Unique identifier
https://doi.org/10.5683/SP2/OAJTAO
Dataset updated
May 19, 2021
Authors
Schabo, Dana G.; Ullmann, Wiebke; de Paula Cunha, Rogerio; Markham, A. Catherine; Alberts, Susan C.; Selva, Nuria; Koch, Flávia; Ali, Abdullahi H.; Zwijacz-Kozica, Tomasz; Thompson, Peter; Sergiel, Agnieszka; Mueller, Thomas; Dekker, Jasja; Ramalho, Emiliano E.; Patterson, Bruce D.; Morato, Ronaldo G.; Farwig, Nina; da Silva, Marina X.; LaPoint, Scott; Beyer, Dean; Medici, Emilia Patricia; Goheen, Jacob R.; Noonan, Michael J.; Olson, Kirk A.; Jeltsch, Florian; Belant, Jerrold L.; Fichtel, Claudia; Fleming, Christen H.; Akre, Tom S.; Ford, Adam T.; Nathan, Ran; Böhning-Gaese, Katrin; Fagan, William F.; Blaum, Niels; Tucker, Marlee A.; Antunes, Pamela C.; Drescher-Lehman, Jonathan; Rosner, Sascha; Calabrese, Justin M.; Paviolo, Agustin; Cullen Jr. , Laury; Fischer, Christina; Spiegel, Orr; Altmann, Jeanne; Zięba, Filip; Oliveira-Santos, Luiz Gustavo R.; Kappeler, Peter M.; Kauffman, Matthew; Janssen, René
Description
AbstractHome range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive dataset of GPS locations from 369 individuals representing 27 species distributed across 5 continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated-Gaussian reference function (AKDE), Silverman's rule of thumb, and least squares cross-validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half-sample cross-validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation ($\hat{N}_\mathrm{area}$) to quantify the information content of each dataset. We found that AKDE 95\% area estimates were larger than conventional IID-based estimates by a mean factor of 2. The median number of cross-validated locations included in the holdout sets by AKDE 95\% (or 50\%) estimates was 95.3\% (or 50.1\%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing $\hat{N}_\mathrm{area}$. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small $\hat{N}_\mathrm{area}$. While 72\% of the 369 empirical datasets had \textgreater1000 total observations, only 4\% had an $\hat{N}_\mathrm{area}$ \textgreater1000, where 30\% had an $\hat{N}_\mathrm{area}$ \textless30. In this frequently encountered scenario of small $\hat{N}_\mathrm{area}$, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data.
Lunar Orbiter Laser Altimeter (LOLA) one-way Laser Ranging Full Rate Data...
catalog.data.gov
gimi9.com
+3more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/GSFC/SED/ESD/GGL/CDDIS (2025). Lunar Orbiter Laser Altimeter (LOLA) one-way Laser Ranging Full Rate Data (all ranges collected, ground stations, aggregate of normal points daily) from NASA CDDIS [Dataset]. https://catalog.data.gov/dataset/lunar-orbiter-laser-altimeter-lola-one-way-laser-ranging-full-rate-data-all-ranges-collect
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Lunar Orbiter Laser Altimeter (LOLA) one-way laser ranging full rate data. These files contain the full rate data (all ranges collected) as delivered from the ground stations participating in one way ranging. Each file is an aggregate of full rate data collected for every station on a particular day. Note that this does not constitute the official data delivered by the LOLA mission; for these data, please visit the LOLA Planetary Data System listed in the reference. The ground station only data may be useful for those who wish to do their own transmit-receive pairing from onboard spacecraft data.
Brown Creeper Range - CWHR B364 [ds1593]
data.cnra.ca.gov
data.ca.gov
+5more
Updated Feb 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2020). Brown Creeper Range - CWHR B364 [ds1593] [Dataset]. https://data.cnra.ca.gov/dataset/brown-creeper-range-cwhr-b364-ds1593
Explore at:
arcgis geoservices rest api, kml, zip, csv, geojson, htmlAvailable download formats
Dataset updated
Feb 15, 2020
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
B
Data from: The evolution of environmental tolerance and range size: A...
borealisdata.ca
search.dataone.org
Updated Nov 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seema Sheth; Amy L. Angert (2022). Data from: The evolution of environmental tolerance and range size: A comparison of geographically restricted and widespread Mimulus [Dataset]. http://doi.org/10.5683/SP2/TOBQS2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/TOBQS2
Dataset updated
Nov 18, 2022
Dataset provided by
Borealis
Authors
Seema Sheth; Amy L. Angert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
western North America
Description
AbstractThe geographic ranges of closely related species can vary dramatically, yet we do not fully grasp the mechanisms underlying such variation. The niche breadth hypothesis posits that species that have evolved broad environmental tolerances can achieve larger geographic ranges than species with narrow environmental tolerances. In turn, plasticity and genetic variation in ecologically important traits and adaptation to environmentally variable areas can facilitate the evolution of broad environmental tolerance. We used five pairs of western North American monkeyflowers to experimentally test these ideas by quantifying performance across eight temperature regimes. In four species pairs, species with broader thermal tolerances had larger geographic ranges, supporting the niche breadth hypothesis. As predicted, species with broader thermal tolerances also had more within-population genetic variation in thermal reaction norms and experienced greater thermal variation across their geographic ranges than species with narrow thermal tolerances. Species with narrow thermal tolerance may be particularly vulnerable to changing climatic conditions due to a lack of plasticity and insufficient genetic variation to respond to novel selection pressures. Conversely, species experiencing high variation in temperature across their ranges may be buffered against extinction due to climatic changes because they have evolved tolerance to a broad range of temperatures. Usage notes Mean thermal performance data for 10 Mimulus species File name: Mimulus_thermal_performance_data.csv This data file includes mean performance, measured as relative growth rate in stem length or leaf number, for each family of each Mimulus species at 15, 20, 25, 30, 35, 40, 45, and 50 degrees Celsius (see publication for further details). Column names are as follows: species: species of Mimulus. family: unique ID corresponding to full-sibling seed family. temperature: daytime temperature of growth chamber. RGR_stem: mean relative growth rate in stem length (in units of cm/cm/day) across individuals of each family. RGR_leaf: mean relative growth rate in leaf number (in units of leaf #/leaf #/day) across individuals of each family. Raw thermal performance data for 10 Mimulus species File name: Mimulus_thermal_performance_data_raw.csv This data file includes raw thermal performance, measured as relative growth rate in stem length or leaf number, for each individual of each Mimulus species at 15, 20, 25, 30, 35, 40, 45, and 50 degrees Celsius (see publication for further details). Column names are as follows: species: species of Mimulus. family: unique ID corresponding to full-sibling seed family. temperature: daytime temperature of growth chamber. stemLen1: length of primary stem in centimeters prior to imposing temperature treatment. leafNum1: number of true leaves > 1mm in length prior to imposing temperature treatment. stemLen2: length of primary stem in centimeters after 7 days of exposure to temperature treatment. leafNum2: number of true leaves > 1mm in length after 7 days of exposure to temperature treatment. RGR_stem: relative growth rate in stem length (in units of cm/cm/day). RGR_leaf: relative growth rate in leaf number (in units of leaf #/leaf #/day). Note that we originally only provided mean thermal performance data for each family in the data file above. We have now uploaded a new file of raw data from which these means were calculated.
u
Data from: Current and projected research data storage needs of Agricultural...
agdatacommons.nal.usda.gov
datasets.ai
+2more
pdf
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia Parr (2023). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. http://doi.org/10.15482/USDA.ADC/1346946
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1346946
Dataset updated
Nov 30, 2023
Dataset provided by
Ag Data Commons
Authors
Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.

Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
BLM ID Range Improvements Poly
catalog.data.gov
datasets.ai
+1more
Updated Nov 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Land Management (2025). BLM ID Range Improvements Poly [Dataset]. https://catalog.data.gov/dataset/blm-id-range-improvements-poly-4bd25
Explore at:
Dataset updated
Nov 11, 2025
Dataset provided by
Bureau of Land Managementhttp://www.blm.gov/
Description
This geodatabase of point, line and polygon features is an effort to consolidate all of the range improvement locations on BLM-managed land in Idaho into one database. Currently, the polygon feature class has some data for all of the BLM field offices except the Coeur d'Alene and Cottonwood field offices. Range improvements are structures intended to enhance rangeland resources, including wildlife, watershed, and livestock management. Examples of range improvements include water troughs, spring headboxes, culverts, fences, water pipelines, gates, wildlife guzzlers, artificial nest structures, reservoirs, developed springs, corrals, exclosures, etc. These structures were first tracked by the Bureau of Land Management (BLM) in the Job Documentation Report (JDR) System in the early 1960s, which was predominately a paper-based tracking system. In 1988 the JDRs were migrated into and replaced by the automated Range Improvement Project System (RIPS), and version 2.0 is currently being used today. It tracks inventory, status, objectives, treatment, maintenance cycle, maintenance inspection, monetary contributions and reporting. Not all range improvements are documented in the RIPS database; there may be some older range improvements that were built before the JDR tracking system was established. There also may be unauthorized projects that are not in RIPS. Official project files of paper maps, reports, NEPA documents, checklists, etc., document the status of each project and are physically kept in the office with management authority for that project area. In addition, project data is entered into the RIPS system to enable managers to access the data to track progress, run reports, analyze the data, etc. Before Geographic Information System technology most offices kept paper atlases or overlay systems that mapped the locations of the range improvements. The objective of this geodatabase is to migrate the location of historic range improvement projects into a GIS for geospatial use with other data and to centralize the range improvement data for the state. This data set is a work in progress and does not have all range improvement projects that are on BLM lands. Some field offices have not migrated their data into this database, and others are partially completed. New projects may have been built but have not been entered into the system. Historic or unauthorized projects may not have case files and are being mapped and documented as they are found. Many field offices are trying to verify the locations and status of range improvements with GPS, and locations may change or projects that have been abandoned or removed on the ground may be deleted. Attributes may be incomplete or inaccurate. This data was created using the standard for range improvements set forth in Idaho IM 2009-044, dated 6/30/2009. However, it does not have all of the fields the standard requires. Fields that are missing from the polygon feature class that are in the standard are: ALLOT_NO, POLY_TYPE, MGMT_AGCY, ADMIN_ST, and ADMIN_OFF. The polygon feature class also does not have a coincident line feature class, so some of the fields from the polygon arc feature class are included in the polygon feature class: COORD_SRC, COORD_SRC2, DEF_FET, DEF_FEAT2, ACCURACY, CREATE_DT, CREATE_BY, MODIFY_DT, MODIFY_BY, GPS_DATE, and DATAFILE. There is no National BLM standard for GIS range improvement data at this time. For more information contact us at blm_id_stateoffice@blm.gov.
d
Data from: Native ranges of freshwater fishes of North America
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Native ranges of freshwater fishes of North America [Dataset]. https://catalog.data.gov/dataset/native-ranges-of-freshwater-fishes-of-north-america
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
North America
Description
Background: The Nonindigenous Aquatic Species (NAS) Database functions as a repository and clearinghouse for the occurrence of nonindigenous aquatic species information from across the United States. The Database contains locality information on more than 1,300 species introduced as early as 1800, including freshwater vertebrates and invertebrates, aquatic plants, and marine fishes. Taxa include both foreign species and North American native species that have been translocated outside of their natural range. Locality data are derived from many sources, including scientific literature; Federal, State, and local natural resource monitoring programs; museum collections; news agencies; and direct submission through online reporting forms. To effectively identify and record new introductions for North American native taxa, a robust estimate of their natural native ranges is required. Previously, the NAS Database has used native range information for fishes provided by NatureServe, which was collected from State natural heritage program inventory data and published State fish books. Although these range maps represent an essential first step in assembling native range data, the NatureServe data has varied for many species due to initial data assumptions (i.e., species presence = nativity). Additionally, NatureServe native ranges exhibit watershed gaps for many species. NAS program staff members have made thousands of corrections to these data internally and periodically communicate these changes back to NatureServe. Methods: Native ranges were developed from several data sources. Dr. Dana Infante, Michigan State University, provided the NAS program with occurrence (presence) data from 40-50 Federal, State, museum, and university data providers gathered during her work on the National Fish Habitat Partnership (NFHP). Although many data providers have offered datasets with no restrictions, some have restrictions on redistribution. In addition to the NFHP data, we utilized occurrence datasets for United States museum collections from Biodiversity Information Serving Our Nation (BISON), National Science Foundation's VertNet, FishNet 2 (fish collections in natural history museums, universities, and other institutions), Multistate Aquatic Resources Information System (MARIS) data and Global Biodiversity Information Facility (GBIF), along with a review of State fish books and other primary literature, to complete native range data maintained locally in the NAS Database. Occurrence datasets will be combined into larger, species-specific datasets for further processing at a hydrologic unit code (HUC). We will use GIS analyses to identify watershed occurrence at the eight-digit (HUC8) and twelve-digit (HUC12) level, using the 2015 version of the Watershed Boundary Dataset. HUCs containing known nonindigenous occurrences will be removed from the native range. Watershed gaps (i.e., a HUC that lies between two that are identified as part of the native range) will be investigated using historical literature to identify data gaps from actual range gaps. We will supply native range data by HUC8 (and HUC12 where possible) for 320 species listed below. These data will be provided as a comma-separated values (CSV) file and be made available on the NAS website via web services application programming interface (API).
American Pika Range - CWHR M043 [ds903]
data.cnra.ca.gov
data.ca.gov
+7more
Updated Feb 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2020). American Pika Range - CWHR M043 [ds903] [Dataset]. https://data.cnra.ca.gov/dataset/american-pika-range-cwhr-m043-ds903
Explore at:
arcgis geoservices rest api, zip, kml, csv, html, geojsonAvailable download formats
Dataset updated
Feb 24, 2020
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for California's wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
Job Dataset
kaggle.com
zip
Updated Sep 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravender Singh Rana (2023). Job Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset
Explore at:
zip(479575920 bytes)Available download formats
Dataset updated
Sep 17, 2023
Authors
Ravender Singh Rana
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Job Dataset

This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.

Descriptions for each of the columns in the dataset:

Job Id: A unique identifier for each job posting.

Experience: The required or preferred years of experience for the job.

Qualifications: The educational qualifications needed for the job.

Salary Range: The range of salaries or compensation offered for the position.

Location: The city or area where the job is located.

Country: The country where the job is located.

Latitude: The latitude coordinate of the job location.

Longitude: The longitude coordinate of the job location.

Work Type: The type of employment (e.g., full-time, part-time, contract).

Company Size: The approximate size or scale of the hiring company.

Job Posting Date: The date when the job posting was made public.

Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)

Contact Person: The name of the contact person or recruiter for the job.

Contact: Contact information for job inquiries.

Job Title: The job title or position being advertised.

Role: The role or category of the job (e.g., software developer, marketing manager).

Job Portal: The platform or website where the job was posted.

Job Description: A detailed description of the job responsibilities and requirements.

Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).

Skills: The skills or qualifications required for the job.

Responsibilities: Specific responsibilities and duties associated with the job.

Company Name: The name of the hiring company.

Company Profile: A brief overview of the company's background and mission.

Potential Use Cases:

Building predictive models to forecast job market trends.

Enhancing job recommendation systems for job seekers.

Developing NLP models for resume parsing and job matching.

Analyzing regional job market disparities and opportunities.

Exploring salary prediction models for various job roles.

Acknowledgements:

We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.

Note:

Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Cristian Garcia (2020). Point Cloud Mnist 2D [Dataset]. https://www.kaggle.com/datasets/cristiangarcia/pointcloudmnist2d/discussion

Point Cloud Mnist 2D

Simple dataset to get started with Point Cloud classification

Explore at:

zip(34176926 bytes)Available download formats

Dataset updated

Feb 12, 2020

Authors

Cristian Garcia

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Point Cloud MNIST 2D

This is a simple dataset for getting started with Machine Learning for point cloud data. It take the original MNIST and converts each of the non-zero pixels into points in a 2D space. The idea is to classify each collection of point (rather than images) to the same label as in the MNIST. The source for generating this dataset can be found in this repository: cgarciae/point-cloud-mnist-2D

Format

There are 2 files: train.csv and test.csv. Each file has the columns

label,x0,y0,v0,x1,y1,v1,...,x350,y350,v350

where

label contains the target label in the range [0, 9]
x{i} contain the x position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].
y{i} contain the y position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].
v{i} contain the value of the pixel in the range [-1, 255].

Padding

The maximum number of point found on a image was 351, images with less points where padded to this length using the following values:

x{i} = -1
y{i} = -1
v{i} = -1

Subsamples

To make the challenge more interesting you can also try to solve the problem using a subset of points, e.g. the first N. Here are some visualizations of the dataset using different amounts of points:

50

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2Fbbf5393884480e3d24772344e079c898%2F50.png?generation=1579911143877077&alt=media" alt="50">

100

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5a83f6f5f7c5791e3c1c8e9eba2d052b%2F100.png?generation=1579911238988368&alt=media" alt="100">

200

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F202098ed0da35c41ae45dfc32e865972%2F200.png?generation=1579911264286372&alt=media" alt="200">

351

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5c733566f8d689c5e0fd300440d04da2%2Fmax.png?generation=1579911289750248&alt=media" alt="">

Distribution

This histogram of the distribution the number of points per image in the dataset can give you a general idea of how difficult each variation can be.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F9eb3b463f77a887dae83a7af0eb08c7d%2Flengths.png?generation=1579911380397412&alt=media" alt="">

Clear search

Close search

Google apps

Main menu

Point Cloud Mnist 2D

Point Cloud MNIST 2D

Format

Padding

Subsamples

50

100

200

351

Distribution

Data from: FISBe: A real-world benchmark dataset for instance segmentation...

optional:import numpy as npraw_np = np.array(raw)

Data from: Jornada Experimental Range (USDA-ARS) monthly stocking data and...

Path loss at 5G high frequency range in South Asia

Citation

TIGER/Line Shapefile, 2016, Series Information for the Address Range-Feature...

Fused Image dataset for convolutional neural Network-based crack Detection...

Turkey Vulture Range - CWHR B108 [ds1441]

Street Network Database SND

Data from: Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 -...

USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven

Virginia Opossum Range - CWHR M001 [ds1799]

Data from: A comprehensive analysis of autocorrelation and bias in home...

Lunar Orbiter Laser Altimeter (LOLA) one-way Laser Ranging Full Rate Data...

Brown Creeper Range - CWHR B364 [ds1593]

Data from: The evolution of environmental tolerance and range size: A...

Data from: Current and projected research data storage needs of Agricultural...

BLM ID Range Improvements Poly

Data from: Native ranges of freshwater fishes of North America

American Pika Range - CWHR M043 [ds903]

Job Dataset

Job Dataset

Descriptions for each of the columns in the dataset:

Potential Use Cases:

Acknowledgements:

Note:

Point Cloud Mnist 2D

Simple dataset to get started with Point Cloud classification

Point Cloud MNIST 2D

Format

Padding

Subsamples

50

100

200

351

Distribution