100+ datasets found
  1. Fused Image dataset for convolutional neural Network-based crack Detection...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Apr 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

    The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

    If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

    In addition, an image dataset for crack classification has also been published at [6].

    References:

    [1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

    [2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

    [3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

    [4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

    [5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

    [6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

  2. N

    Grass Range, MT Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Grass Range, MT Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/grass-range-mt-population-by-gender/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Grass Range, Montana
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Grass Range by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Grass Range across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a considerable majority of female population, with 71.13% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Grass Range is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Grass Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Grass Range Population by Race & Ethnicity. You can refer the same here

  3. d

    Geospatial Database of Hydroclimate Variables, Spring Mountains and Sheep...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Sep 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Geospatial Database of Hydroclimate Variables, Spring Mountains and Sheep Range, Clark County, Nevada [Dataset]. https://catalog.data.gov/dataset/geospatial-database-of-hydroclimate-variables-spring-mountains-and-sheep-range-clark-count
    Explore at:
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Spring Mountains, Nevada, Clark County, Sheep Range
    Description

    This point feature class contains 81,481 points arranged in a 270-meter spaced grid that covers the Spring Mountains and Sheep Range in Clark County, Nevada. Points are attributed with hydroclimate variables and ancillary data compiled to support efforts to characterize ecological zones.

  4. Z

    Wallhack1.8k Dataset | Data Augmentation Techniques for Cross-Domain WiFi...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kampel, Martin (2025). Wallhack1.8k Dataset | Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8188998
    Explore at:
    Dataset updated
    Apr 4, 2025
    Dataset provided by
    Strohmayer, Julian
    Kampel, Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the Wallhack1.8k dataset for WiFi-based long-range activity recognition in Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS)/Through-Wall scenarios, as proposed in [1,2], as well as the CAD models (of 3D-printable parts) of the WiFi systems proposed in [2].

    PyTroch Dataloader

    A minimal PyTorch dataloader for the Wallhack1.8k dataset is provided at: https://github.com/StrohmayerJ/wallhack1.8k

    Dataset Description

    The Wallhack1.8k dataset comprises 1,806 CSI amplitude spectrograms (and raw WiFi packet time series) corresponding to three activity classes: "no presence," "walking," and "walking + arm-waving." WiFi packets were transmitted at a frequency of 100 Hz, and each spectrogram captures a temporal context of approximately 4 seconds (400 WiFi packets).

    To assess cross-scenario and cross-system generalization, WiFi packet sequences were collected in LoS and through-wall (NLoS) scenarios, utilizing two different WiFi systems (BQ: biquad antenna and PIFA: printed inverted-F antenna). The dataset is structured accordingly:

    LOS/BQ/ <- WiFi packets collected in the LoS scenario using the BQ system

    LOS/PIFA/ <- WiFi packets collected in the LoS scenario using the PIFA system

    NLOS/BQ/ <- WiFi packets collected in the NLoS scenario using the BQ system

    NLOS/PIFA/ <- WiFi packets collected in the NLoS scenario using the PIFA system

    These directories contain the raw WiFi packet time series (see Table 1). Each row represents a single WiFi packet with the complex CSI vector H being stored in the "data" field and the class label being stored in the "class" field. H is of the form [I, R, I, R, ..., I, R], where two consecutive entries represent imaginary and real parts of complex numbers (the Channel Frequency Responses of subcarriers). Taking the absolute value of H (e.g., via numpy.abs(H)) yields the subcarrier amplitudes A.

    To extract the 52 L-LTF subcarriers used in [1], the following indices of A are to be selected:

    52 L-LTF subcarriers

    csi_valid_subcarrier_index = [] csi_valid_subcarrier_index += [i for i in range(6, 32)] csi_valid_subcarrier_index += [i for i in range(33, 59)]

    Additional 56 HT-LTF subcarriers can be selected via:

    56 HT-LTF subcarriers

    csi_valid_subcarrier_index += [i for i in range(66, 94)]
    csi_valid_subcarrier_index += [i for i in range(95, 123)]

    For more details on subcarrier selection, see ESP-IDF (Section Wi-Fi Channel State Information) and esp-csi.

    Extracted amplitude spectrograms with the corresponding label files of the train/validation/test split: "trainLabels.csv," "validationLabels.csv," and "testLabels.csv," can be found in the spectrograms/ directory.

    The columns in the label files correspond to the following: [Spectrogram index, Class label, Room label]

    Spectrogram index: [0, ..., n]

    Class label: [0,1,2], where 0 = "no presence", 1 = "walking", and 2 = "walking + arm-waving."

    Room label: [0,1,2,3,4,5], where labels 1-5 correspond to the room number in the NLoS scenario (see Fig. 3 in [1]). The label 0 corresponds to no room and is used for the "no presence" class.

    Dataset Overview:

    Table 1: Raw WiFi packet sequences.

    Scenario System "no presence" / label 0 "walking" / label 1 "walking + arm-waving" / label 2 Total

    LoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    LoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    NLoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    NLoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    4 20 20 44

    Table 2: Sample/Spectrogram distribution across activity classes in Wallhack1.8k.

    Scenario System

    "no presence" / label 0

    "walking" / label 1

    "walking + arm-waving" / label 2 Total

    LoS BQ 149 154 155

    LoS PIFA 149 160 152

    NLoS BQ 148 150 152

    NLoS PIFA 143 147 147

    589 611 606 1,806

    Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to one of our papers [1,2].

    [1] Strohmayer, Julian, and Martin Kampel. (2024). “Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition”, In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 42-56). Cham: Springer Nature Switzerland, doi: https://doi.org/10.1007/978-3-031-63211-2_4.

    [2] Strohmayer, Julian, and Martin Kampel., “Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition,” 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 3594-3599, doi: https://doi.org/10.1109/ICIP51287.2024.10647666.

    BibTeX citations:

    @inproceedings{strohmayer2024data, title={Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition}, author={Strohmayer, Julian and Kampel, Martin}, booktitle={IFIP International Conference on Artificial Intelligence Applications and Innovations}, pages={42--56}, year={2024}, organization={Springer}}@INPROCEEDINGS{10647666, author={Strohmayer, Julian and Kampel, Martin}, booktitle={2024 IEEE International Conference on Image Processing (ICIP)}, title={Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition}, year={2024}, volume={}, number={}, pages={3594-3599}, keywords={Visualization;Accuracy;System performance;Directional antennas;Directive antennas;Reflector antennas;Sensors;Human Activity Recognition;WiFi;Channel State Information;Through-Wall Sensing;ESP32}, doi={10.1109/ICIP51287.2024.10647666}}

  5. GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034

    • catalog.data.gov
    • s.cnmilf.com
    • +4more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA NSIDC DAAC (2025). GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 [Dataset]. https://catalog.data.gov/dataset/glas-icesat-l1b-global-waveform-based-range-corrections-data-hdf5-v034
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    GLAH05 Level-1B waveform parameterization data include output parameters from the waveform characterization procedure and other parameters required to calculate surface slope and relief characteristics. GLAH05 contains parameterizations of both the transmitted and received pulses and other characteristics from which elevation and footprint-scale roughness and slope are calculated. The received pulse characterization uses two implementations of the retracking algorithms: one tuned for ice sheets, called the standard parameterization, used to calculate surface elevation for ice sheets, oceans, and sea ice; and another for land (the alternative parameterization). Each data granule has an associated browse product.

  6. b

    Home range and body size data compiled from the literature for marine and...

    • bco-dmo.org
    • search.dataone.org
    csv, pdf, tsv, txt
    Updated Jan 31, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malin Pinsky; Doug McCauley (2019). Home range and body size data compiled from the literature for marine and terrestrial vertebrates [Dataset]. http://doi.org/10.1575/1912/bco-dmo.752795.1
    Explore at:
    txt(89 bytes), pdf(154613 bytes), csv(148306 bytes), pdf(38070 bytes), tsv(32947 bytes)Available download formats
    Dataset updated
    Jan 31, 2019
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    Malin Pinsky; Doug McCauley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    BM, HR, Refs, Group, System, Species
    Description

    Home range and body size data compiled from the literature for marine and terrestrial vertebrates.

    These data were published in McCauley et al. (2015) Table S2.

  7. ECMWF ERA5t: model level analysis parameter data

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (ECMWF) (2023). ECMWF ERA5t: model level analysis parameter data [Dataset]. https://catalogue.ceda.ac.uk/uuid/8177330a5f2443059b7107188c2ab3c1
    Explore at:
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    European Centre for Medium-Range Weather Forecasts (ECMWF)
    License

    https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf

    Area covered
    Earth
    Variables measured
    time, latitude, longitude, Temperature, Geopotential, geopotential, eastward_wind, northward_wind, air_temperature, Specific humidity, and 8 more
    Description

    This dataset contains ERA5 initial release (ERA5t) model level analysis parameter data. ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.

    Surface level analysis and forecast data to complement this dataset are also available. Data from a 10 member ensemble, run at lower spatial and temporal resolution, were also produced to provide an uncertainty estimate for the output from the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation producing data in this dataset.

  8. ECMWF ERA5: surface level analysis parameter data

    • catalogue.ceda.ac.uk
    Updated Jun 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (ECMWF) (2023). ECMWF ERA5: surface level analysis parameter data [Dataset]. https://catalogue.ceda.ac.uk/uuid/c1145ccc4b6d4310a4fc7cce61041b63
    Explore at:
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    European Centre for Medium-Range Weather Forecasts (ECMWF)
    License

    https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf

    Area covered
    Earth
    Variables measured
    cloud_area_fraction, sea_ice_area_fraction, air_pressure_at_sea_level, lwe_thickness_of_surface_snow_amount, lwe_thickness_of_atmosphere_mass_content_of_water_vapor
    Description

    This dataset contains ERA5 surface level analysis parameter data. ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.

    Model level analysis and surface forecast data to complement this dataset are also available. Data from a 10 member ensemble, run at lower spatial and temporal resolution, were also produced to provide an uncertainty estimate for the output from the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation producing data in this dataset.

    The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects.

    An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.

  9. B

    Data from: A comprehensive analysis of autocorrelation and bias in home...

    • borealisdata.ca
    • search.dataone.org
    • +1more
    Updated May 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael J. Noonan; Marlee A. Tucker; Christen H. Fleming; Tom S. Akre; Susan C. Alberts; Abdullahi H. Ali; Jeanne Altmann; Pamela C. Antunes; Jerrold L. Belant; Dean Beyer; Niels Blaum; Katrin Böhning-Gaese; Laury Cullen Jr.; Rogerio de Paula Cunha; Jasja Dekker; Jonathan Drescher-Lehman; Nina Farwig; Claudia Fichtel; Christina Fischer; Adam T. Ford; Jacob R. Goheen; René Janssen; Florian Jeltsch; Matthew Kauffman; Peter M. Kappeler; Flávia Koch; Scott LaPoint; A. Catherine Markham; Emilia Patricia Medici; Ronaldo G. Morato; Ran Nathan; Luiz Gustavo R. Oliveira-Santos; Kirk A. Olson; Bruce D. Patterson; Agustin Paviolo; Emiliano E. Ramalho; Sascha Rosner; Nuria Selva; Agnieszka Sergiel; Marina X. da Silva; Orr Spiegel; Peter Thompson; Wiebke Ullmann; Filip Zięba; Tomasz Zwijacz-Kozica; William F. Fagan; Thomas Mueller; Justin M. Calabrese (2021). Data from: A comprehensive analysis of autocorrelation and bias in home range estimation [Dataset]. http://doi.org/10.5683/SP2/OAJTAO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2021
    Dataset provided by
    Borealis
    Authors
    Michael J. Noonan; Marlee A. Tucker; Christen H. Fleming; Tom S. Akre; Susan C. Alberts; Abdullahi H. Ali; Jeanne Altmann; Pamela C. Antunes; Jerrold L. Belant; Dean Beyer; Niels Blaum; Katrin Böhning-Gaese; Laury Cullen Jr.; Rogerio de Paula Cunha; Jasja Dekker; Jonathan Drescher-Lehman; Nina Farwig; Claudia Fichtel; Christina Fischer; Adam T. Ford; Jacob R. Goheen; René Janssen; Florian Jeltsch; Matthew Kauffman; Peter M. Kappeler; Flávia Koch; Scott LaPoint; A. Catherine Markham; Emilia Patricia Medici; Ronaldo G. Morato; Ran Nathan; Luiz Gustavo R. Oliveira-Santos; Kirk A. Olson; Bruce D. Patterson; Agustin Paviolo; Emiliano E. Ramalho; Sascha Rosner; Nuria Selva; Agnieszka Sergiel; Marina X. da Silva; Orr Spiegel; Peter Thompson; Wiebke Ullmann; Filip Zięba; Tomasz Zwijacz-Kozica; William F. Fagan; Thomas Mueller; Justin M. Calabrese
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Global
    Dataset funded by
    National Science Foundation
    Description

    AbstractHome range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive dataset of GPS locations from 369 individuals representing 27 species distributed across 5 continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated-Gaussian reference function (AKDE), Silverman's rule of thumb, and least squares cross-validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half-sample cross-validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation ($\hat{N}_\mathrm{area}$) to quantify the information content of each dataset. We found that AKDE 95\% area estimates were larger than conventional IID-based estimates by a mean factor of 2. The median number of cross-validated locations included in the holdout sets by AKDE 95\% (or 50\%) estimates was 95.3\% (or 50.1\%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing $\hat{N}_\mathrm{area}$. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small $\hat{N}_\mathrm{area}$. While 72\% of the 369 empirical datasets had \textgreater1000 total observations, only 4\% had an $\hat{N}_\mathrm{area}$ \textgreater1000, where 30\% had an $\hat{N}_\mathrm{area}$ \textless30. In this frequently encountered scenario of small $\hat{N}_\mathrm{area}$, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data. Usage notesEmpirical GPS tracking dataAnonymised, empirical tracking data used to estimate home range areas based on various home range estimators.Anonymised_Data.zip

  10. Product Retail Price Survey 2017-2025

    • kaggle.com
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aradhana Hirapara (2025). Product Retail Price Survey 2017-2025 [Dataset]. https://www.kaggle.com/datasets/aradhanahirapara/product-retail-price-survey-2017-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aradhana Hirapara
    Description

    This dataset contains monthly retail price data for a wide range of consumer products sold in various Canadian provinces over several years. It has been enriched with tax, category, and classification metadata for deeper insights.

    Usefulness of the Dataset

    This dataset can be used for:

    Use CaseDescription
    Price Trend AnalysisTrack price movements over time, province, and product category.
    Inflation StudiesExamine inflation on essentials vs non-essentials over time.
    Regional Price ComparisonAnalyze cost disparities for the same goods across provinces.
    Tax Policy ImpactUnderstand how tax laws affect consumer pricing by region.
    Budget OptimizationIdentify high-cost vs low-cost essentials for better planning.
    Machine Learning IntegrationUse in models for price prediction or consumer segmentation.

    Purpose and Use Cases

    This dataset is ideal for:

    🏛️ Policy Analysis

    Understand how federal and provincial taxes shape price access — especially for essentials like milk, bread, or medications.

    🧍‍♀️ Consumer Insights

    See how costs for personal care, food, and baby goods evolve month-over-month in each region.

    💸 Inflation & Seasonality

    Analyze how monthly or yearly trends (e.g., holiday spikes or inflation events) affect product pricing.

    🌍 Social Impact Studies

    Measure product accessibility gaps between provinces for low-income consumers or high-tax regions.

    🛍️ Retail & Budget Planning

    Guide families, retailers, or policymakers on where and when to buy or subsidize certain products.

  11. Z

    Dataset for "ConfSolv: Prediction of solute conformer free energies across a...

    • data.niaid.nih.gov
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frederik Sandfort (2023). Dataset for "ConfSolv: Prediction of solute conformer free energies across a range of solvents" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8292519
    Explore at:
    Dataset updated
    Oct 25, 2023
    Dataset provided by
    Kevin A. Spiekermann
    Zipei Tan
    Frederik Sandfort
    Volker Settels
    Florence Vermeire
    Philipp Eiden
    William H. Green
    Angiras Menon
    Lagnajit Pattanaik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains three archives. The first archive, full_dataset.zip, contains geometries and free energies for nearly 44,000 solute molecules with almost 9 million conformers, in 42 different solvents. The geometries and gas phase free energies are computed using density functional theory (DFT). The solvation free energy for each conformer is computed using COSMO-RS and the solution free energies are computed using the sum of the gas phase free energies and the solvation free energies. The geometries for each solute conformer are provided as ASE_atoms_objects within a pandas DataFrame, found in the compressed file dft coords.pkl.gz within full_dataset.zip. The gas-phase energies, solvation free energies, and solution free energies are also provided as a pandas DataFrame in the compressed file free_energy.pkl.gz within full_dataset.zip. Ten example data splits for both random and scaffold split types are also provided in the ZIP archive for training models. Scaffold split index 0 is used to generate results in the corresponding publication. The second archive, refined_conf_search.zip, contains geometries and free energies for a representative sample of 28 solute molecules from the full dataset that were subject to a refined conformer search and thus had more conformers located. The format of the data is identical to full_dataset.zip. The third archive contains one folder for each solvent for which we have provided free energies in full_dataset.zip. Each folder contains the .cosmo file for every solvent conformer used in the COSMOtherm calculations, a dummy input file for the COSMOtherm calculations, and a CSV file that contains the electronic energy of each solvent conformer that needs to be substituted for "EH_Line" in the dummy input file.

  12. d

    NZ Roads: Address Range Road - Dataset - data.govt.nz - discover and use...

    • catalogue.data.govt.nz
    Updated Dec 9, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). NZ Roads: Address Range Road - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/nz-roads-address-range-road
    Explore at:
    Dataset updated
    Dec 9, 2015
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    Please read: This is the Address Range Road table and is part of the set of NZ Roads tables. The Address Range Road table provides an identifier that groups one or more Road Sections that have a non-standard address range. This currently includes roads that consist of multiple address ranges. The address range values are not held or provided as part of this dataset. The NZ Roads dataset includes eight data tables and eleven lookup tables. The dataset has been sourced from LINZ’s NZ Roads database, a database for the management of national roads, including those managed for addressing purposes. This set of normalised tables replaces the Landonline: Road Centre Line layer and the Landonline: Road Name and Landonline: Road Name Association tables currently published on LDS. These centrelines are required to indicate the presence of an authoritative road name. Named centrelines are not intended to represent the exact location of a road formation. Named centrelines do not indicate the presence of legal access. For a simplified version of the data contained within these tables see NZ Roads (Addressing), which aggregates geometries based on road name, and NZ Roads Subsections (Addressing), which holds the individual geometries. Please refer to the NZ Roads Data Dictionary for detailed metadata and information about this layer.

  13. Data from: GALILEO VENUS RANGE FIX RAW DATA V1.0

    • catalog.data.gov
    • datasets.ai
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). GALILEO VENUS RANGE FIX RAW DATA V1.0 [Dataset]. https://catalog.data.gov/dataset/galileo-venus-range-fix-raw-data-v1-0-0943a
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Raw radio tracking data used to determine the precise distance to Venus (and improve knowledge of the Astronomical Unit) from the Galileo flyby on 10 February 1990.

  14. n

    Home range size and habitat availability data for 39 individual European...

    • data-search.nerc.ac.uk
    • hosted-metadata.bgs.ac.uk
    • +2more
    zip
    Updated Mar 26, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NERC EDS Environmental Information Data Centre (2020). Home range size and habitat availability data for 39 individual European nightjars on the Humberhead Peatlands NNR from 2015-2018 [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/api/records/d5cc1b92-6862-4475-8aa1-5936786d12ab
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 26, 2020
    Dataset provided by
    University of York
    NERC EDS Environmental Information Data Centre
    License

    http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations

    https://eidc.ceh.ac.uk/licences/OGL/plainhttps://eidc.ceh.ac.uk/licences/OGL/plain

    Time period covered
    Jan 1, 2015 - Dec 31, 2018
    Area covered
    Description

    This dataset contains home range size, habitat availability and selection ratio data, calculated from GPS data fixes collected from individual European nightjars, in four concurrent years (2015-2018). Home ranges are 95% areas of use, presented in hectares. Habitat availability data are presented as the percentage (%) of each habitat category (n = 6, pooled from 14 original habitat types) available to each individual within their 95% home range. Selection ratios are Manly Selection Ratios for 14 habitat types and express the extent to which each habitat type is used by each individual bird, compared to how much of it is available. Selection Ratios >1 express positive selection – i.e. used more than expected, given availability. Selection Ratios <1 express avoidance – i.e. used less than expected, given availability. Full details about this dataset can be found at https://doi.org/10.5285/d5cc1b92-6862-4475-8aa1-5936786d12ab

  15. PROVE Surface albedo of Jornada Experimental Range, New Mexico, 1997 -...

    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • data.nasa.gov
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). PROVE Surface albedo of Jornada Experimental Range, New Mexico, 1997 - Dataset - NASA Open Data Portal [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/prove-surface-albedo-of-jornada-experimental-range-new-mexico-1997-03c07
    Explore at:
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Jornada, New Mexico
    Description

    The objective of this study was to determine the spatial variations in field measurements of broadband albedo as related to the ground cover and under a range of solar conditions during the Prototype Validation Exercise (PROVE) at the Jornada Experimental Range in New Mexico on May 20-30, 1997.

  16. a

    Endemic Mammal Richness in California, Range Weighted (Data Basin Dataset)

    • hub.arcgis.com
    Updated Apr 20, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mkoo (2011). Endemic Mammal Richness in California, Range Weighted (Data Basin Dataset) [Dataset]. https://hub.arcgis.com/content/c5d971cdbb6e4f4ab8bfcfa368623f59
    Explore at:
    Dataset updated
    Apr 20, 2011
    Dataset authored and provided by
    mkoo
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    Project Goals: To identify regions of recently evolved endemic (neo-endemism) mammal species in California and thereby infer areas of rapid evolutionary diversification, which may help guide conservation prioritization and future planning for protected areas. Four species-based GIS rasters were produced of mammalian endemism (see reference for details ). This is: Richness of species distribution models weighted by inverse range size Abstract: The high rate of anthropogenic impact on natural systems mandates protection of the evolutionary processes that generate and sustain biological diversity. Environmental drivers of diversification include spatial heterogeneity of abiotic and biotic agents of divergent selection, features that suppress gene flow, and climatic or geological processes that open new niche space. To explore how well such proxies perform as surrogates for conservation planning, we need first to map areas with rapid diversification — ‘evolutionary hotspots’. Here we combine estimates of range size and divergence time to map spatial patterns of neo-endemism for mammals of California, a global biodiversity hotspot. Neo-endemism is explored at two scales: (i) endemic species, weighted by the inverse of range size and mtDNA sequence divergence from sisters; and (ii) as a surrogate for spatial patterns of phenotypic divergence, endemic subspecies, again using inverse-weighting of range size. The species-level analysis revealed foci of narrowly endemic, young taxa in the central Sierra Nevada, northern and central coast, and Tehachapi and Peninsular Ranges. The subspecies endemism-richness analysis supported the last four areas as hotspots for diversification, but also highlighted additional coastal areas (Monterey to north of San Francisco Bay) and the Inyo Valley to the east. We suggest these hotspots reflect the major processes shaping mammal neo-endemism: steep environmental gradients, biotic admixture areas, and areas with recent geological/climate change. Anthropogenic changes to both environment and land use will have direct impacts on regions of rapid divergence. However, despite widespread changes to land cover in California, the majority of the hotspots identified here occur in areas with relatively intact ecological landscapes. The geographical scope of conserving evolutionary process is beyond the scale of any single agency or nongovernmental organization. Choosing which land to closely protect and/or purchase will always require close coordination between agencies. Citation:DAVIS, E.B., KOO, M.S., CONROY, C., PATTON, J.L. & MORITZ, C. (2008) The California Hotspots Project: identifying regions of rapid diversification of mammals. Molecular Ecology 17, 120 -138. This dataset was reviewed in another manner. Spatial Resolution: 0.0083333338 DD This layer package was loaded using Data Basin.Click here to go to the detail page for this layer package in Data Basin, where you can find out more information, such as full metadata, or use it to create a live web map.

  17. P

    SI-HDR Dataset

    • paperswithcode.com
    Updated Aug 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Param Hanji; Rafał K. Mantiuk; Gabriel Eilertsen; Saghi Hajisharif; Jonas Unger (2023). SI-HDR Dataset [Dataset]. https://paperswithcode.com/dataset/si-hdr
    Explore at:
    Dataset updated
    Aug 12, 2023
    Authors
    Param Hanji; Rafał K. Mantiuk; Gabriel Eilertsen; Saghi Hajisharif; Jonas Unger
    Description

    The dataset consists of 181 HDR images. Each image includes: 1) a RAW exposure stack, 2) an HDR image, 3) simulated camera images at two different exposures 4) Results of 6 single-image HDR reconstruction methods: Endo et al. 2017, Eilertsen et al. 2017, Marnerides et al. 2018, Lee et al. 2018, Liu et al. 2020, and Santos et al. 2020

    Project web page More details can be found at: https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr_benchmark/

    Overview This dataset contains 181 RAW exposure stacks selected to cover a wide range of image content and lighting conditions. Each scene is composed of 5 RAW exposures and merged into an HDR image using the estimator that accounts photon noise 3. A simple color correction was applied using a reference white point and all merged HDR images were resized to 1920×1280 pixels.

    The primary purpose of the dataset was to compare various single image HDR (SI-HDR) methods [1]. Thus, we selected a wide variety of content covering nature, portraits, cities, indoor and outdoor, daylight and night scenes. After merging and resizing, we simulated captures by applying a custom CRF and added realistic camera noise based on estimated noise parameters of Canon 5D Mark III.

    The simulated captures were inputs to six selected SI-HDR methods. You can view the reconstructions of various methods for select scenes on our interactive viewer. For the remaining scenes, please download the appropriate zip files. We conducted a rigorous pairwise comparison experiment on these images to find that widely-used metrics did not correlate well with subjective data. We then proposed an improved evaluation protocol for SI-HDR [1].

    If you find this dataset useful, please cite [1].

    References [1] Param Hanji, Rafał K. Mantiuk, Gabriel Eilertsen, Saghi Hajisharif, and Jonas Unger. 2022. “Comparison of single image hdr reconstruction methods — the caveats of quality assessment.” In Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings (SIGGRAPH ’22 Conference Proceedings). [Online]. Available: https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr_benchmark/

    [2] Gabriel Eilertsen, Saghi Hajisharif, Param Hanji, Apostolia Tsirikoglou, Rafał K. Mantiuk, and Jonas Unger. 2021. “How to cheat with metrics in single-image HDR reconstruction.” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. 3998–4007.

    [3] Param Hanji, Fangcheng Zhong, and Rafał K. Mantiuk. 2020. “Noise-Aware Merging of High Dynamic Range Image Stacks without Camera Calibration.” In Advances in Image Manipulation (ECCV workshop). Springer, 376–391. [Online]. Available: https://www.cl.cam.ac.uk/research/rainbow/projects/noise-aware-merging/

  18. Z

    Film Circulation dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loist, Skadi (2024). Film Circulation dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7887671
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Loist, Skadi
    Samoilova, Evgenia (Zhenya)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

    A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

    Please cite this when using the dataset.

    Detailed description of the dataset:

    1 Film Dataset: Festival Programs

    The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

    The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

    The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

    2 Survey Dataset

    The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

    The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

    The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

    3 IMDb & Scripts

    The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

    The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

    The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

    The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

    The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

    The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

    The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

    The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

    The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

    The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

    The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

    The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

    The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

    The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

    The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

    The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

    The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

    The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

    The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

    4 Festival Library Dataset

    The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

    The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories, units of measurement, data sources and coding and missing data.

    The csv file “4_festival-library_dataset_imdb-and-survey” contains data on all unique festivals collected from both IMDb and survey sources. This dataset appears in wide format, all information for each festival is listed in one row. This

  19. c

    ckanext-extend_search

    • catalog.civicdataecosystem.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckanext-extend_search [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-extend_search
    Explore at:
    Description

    The extend_search extension enhances the CKAN data catalog by adding advanced search capabilities. It focuses on improving how users find datasets by introducing date range filtering based on the 'modified-on' metadata, and enables searching datasets by custodian. By incorporating these features, extend_search makes it easier for users to discover relevant datasets within a CKAN instance. Key Features: Date Range Search Filter: Allows users to filter datasets based on a date range applied to the 'modified-on' metadata field. This feature utilizes the bootstrap-daterangepicker library, crediting Dan Grossman’s work, to provide a user-friendly interface for selecting date ranges. Custodian Search Filter: Introduces the ability to search datasets based on the custodian responsible for the dataset. This facilitates finding datasets managed by specific organizations or individuals. Technical Integration: The extension is installed via standard CKAN extension installation procedures. This involves cloning the repository, installing the required Python packages using pip, installing the extension using setup.py, and enabling the extend_search plugin in the CKAN configuration file (.ini). Benefits & Impact: By implementing the extend_search extension, CKAN installations can improve the findability of datasets, saving users time and effort. Date range filtering is specifically useful when searching for recently updated datasets, while custodian filtering is helpful when looking for datasets managed by specific entities.

  20. u

    Data from: Dataset for evaluation of range-based people tracker classifiers...

    • portalcientifico.unileon.es
    • data.niaid.nih.gov
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Álvarez-Aparicio, Claudia; Guerrero-Higueras, Ángel Manuel; Álvarez-Aparicio, Claudia; Guerrero-Higueras, Ángel Manuel (2021). Dataset for evaluation of range-based people tracker classifiers in mobile robots [Dataset]. https://portalcientifico.unileon.es/documentos/668fc461b9e7c03b01bdb93c?lang=ca
    Explore at:
    Dataset updated
    2021
    Authors
    Álvarez-Aparicio, Claudia; Guerrero-Higueras, Ángel Manuel; Álvarez-Aparicio, Claudia; Guerrero-Higueras, Ángel Manuel
    Description

    This dataset can be used to evaluate the performance of different approaches for detecting and tracking people by using lidar sensors. Information contained in the dataset is especially suitable to be used as test data for neural network-based classifiers. This dataset contains 25 Rosbag files recorded in different locations with Orbi-One robot stood still. Two sorts of Rosbag files have been recorded. In 17 Rosbag files (1-17), there were people stood still in the scene. They were placed in known locations to get ground-truth data. The locations where the people were placed for each rosbag are the following: 1.bag: [1]
    2.bag: [1, 2]
    3.bag: [1, 2, 3]
    4.bag: [2, 3, 4]
    5.bag: [1, 2, 4]
    6.bag: [5, 6, 7]
    7.bag: [6, 7]
    8.bag: [6, 7, 8]
    9.bag: [11, 12, 13, 14]
    10.bag: [11, 12, 13, 14, 15]
    11.bag: [11, 12, 13]
    12.bag: [13, 15]
    13.bag: [14, 15]
    14.bag: [10, 11]
    15.bag: [11]
    16.bag: [6]
    17.bag: [6, 7] The (x, y) positions of each point on the map are the following: 1: [1.30, 0.76]
    2: [2.10, 1.56]
    3: [2.90, 1.16]
    4: [3.70, 0.55]
    5: [6.53, 1.75]
    6: [7.73, 1.16]
    7: [8.93, 1.75]
    8: [9.73, 0.75]
    9: [14.16, 1.14]
    10: [15.36, 0.14]
    11: [16.56, 1.76]
    12: [16.96, 0.14]
    13: [17.76, 0.54]
    14: [18.16, 1.54] The remaining 8 Rosbag files (18-25) were recorded without people in the scene in order to evaluate the True Negatives rate.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
Organization logo

Fused Image dataset for convolutional neural Network-based crack Detection (FIND)

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Apr 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

Search
Clear search
Close search
Google apps
Main menu