48 datasets found
  1. KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/keyword-search-in-text-cube-finding-top-k-relevant-cells
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS BOLIN DING, YINTAO YU, BO ZHAO, CINDY XIDE LIN, JIAWEI HAN, AND CHENGXIANG ZHAI Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

  2. d

    Data from: Topic Modeling for OLAP on Multidimensional Text Databases: Topic...

    • catalog.data.gov
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Topic Modeling for OLAP on Multidimensional Text Databases: Topic Cube and its Applications [Dataset]. https://catalog.data.gov/dataset/topic-modeling-for-olap-on-multidimensional-text-databases-topic-cube-and-its-applications
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. Although online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we study a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and stores probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose two heuristic aggregations to speed up the iterative Expectation-Maximization (EM) algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experimental results show that these heuristic aggregations are much faster than the baseline method of computing each topic cube from scratch. We also discuss some potential uses of topic cube and show sample experimental results.

  3. A global land-use data cube 1992-2020 based on the Human Appropriation of...

    • zenodo.org
    zip
    Updated Apr 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Matej; Sarah Matej; Florian Weidinger; Florian Weidinger; Lisa Kaufmann; Lisa Kaufmann; Nicolas Roux; Nicolas Roux; Simone Gingrich; Simone Gingrich; Helmut Haberl; Helmut Haberl; Fridolin Krausmann; Fridolin Krausmann; Karl-Heinz Erb; Karl-Heinz Erb (2025). A global land-use data cube 1992-2020 based on the Human Appropriation of Net Primary Production: Dataset 1 [Dataset]. http://doi.org/10.5281/zenodo.13990766
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sarah Matej; Sarah Matej; Florian Weidinger; Florian Weidinger; Lisa Kaufmann; Lisa Kaufmann; Nicolas Roux; Nicolas Roux; Simone Gingrich; Simone Gingrich; Helmut Haberl; Helmut Haberl; Fridolin Krausmann; Fridolin Krausmann; Karl-Heinz Erb; Karl-Heinz Erb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the LUIcube, a global dataset on land-use at 30 arcsecond spatial resolution. The LUIcube includes information on area, the change in NPP due to land conversions (HANPPluc), the harvested NPP (including losses, HANPPharv), and the NPP remaining in ecosystems after harvest (NPPeco) for 32 land-use classes in annual time-steps from 1992 to 2020. A detailed description of the LUIcube is available in the accompanying publication.

    The layers of land-use areas are provided in square kilometers (km²) per grid cell. All NPP flows are provided in tC/yr per grid cell. Adding HANPPharv to NPPeco results in the actual NPP available before harvest (NPPact=NPPeco+HANPPharv), and adding HANPPluc to NPPact results in the potential NPP available in the hypothetical absence of land use (NPPpot=NPPact+HANPPluc) for the given land-use class. Area-intensive values (in gC/m²/yr) can be calculated by dividing the NPP flows by the area of the respective land-use class per grid cell. HANPP in % of NPPpot can be calculated by summing up HANPPharv and HANPPluc and dividing it by NPPpot. Areas and NPP flows of land-use classes can be aggregated to calculate their overall HANPP.

    This Zenodo repository provides data on following land-use classes: unused productive wilderness areas (WILD-core); productive wilderness areas that are sporadically used at very low intensity (WILD-periphery); unused unproductive wilderness areas (WILD-nps); forestry areas, mainly coniferous (FO-con); forestry areas, mainly non-coniferous (FO-ncon); settlements, urban areas and infrastructure (BU-builtup)

  4. Indices of Multiple Deprivation 2010, Employment Score - Dataset -...

    • ckan.publishing.service.gov.uk
    Updated Oct 27, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2014). Indices of Multiple Deprivation 2010, Employment Score - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/indices-of-multiple-deprivation-2010-employment-score
    Explore at:
    Dataset updated
    Oct 27, 2014
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Score for each LSOA in the Employment Deprivation domain. The English Indices of Deprivation provide a relative measure of deprivation at small area level across England. Areas are ranked from least deprived to most deprived on seven different dimensions of deprivation and an overall composite measure of multiple deprivation. Most of the data underlying the 2010 indices are for the year 2008. The indices have been constructed by the Social Disadvantage Research Centre at the University of Oxford for the Department for Communities and Local Government. All figures can only be reproduced if the source (Department for Communities and Local Government, Indices of Deprivation 2010) is fully acknowledged. The domains used in the Indices of Deprivation 2010 are: income deprivation; employment deprivation; health deprivation and disability; education deprivation; crime deprivation; barriers to housing and services deprivation; and living environment deprivation. Each of these domains has its own scores and ranks, allowing users to focus on specific aspects of deprivation. Because the indices give a relative measure, they can tell you if one area is more deprived than another but not by how much. For example, if an area has a rank of 40 it is not half as deprived as a place with a rank of 20. The Index of Multiple Deprivation was constructed by combining scores from the seven domains. When comparing areas, a higher deprivation score indicates a higher proportion of people living there who are classed as deprived. But as for ranks, deprivation scores can only tell you if one area is more deprived than another, but not by how much. This dataset was created from a spreadsheet provided by the Department of Communities and Local Government, which can be downloaded here. The method for calculating the IMD score and underlying indicators is detailed in the report 'The English Indices of Deprivation 2010: Technical Report'. The data is represented here as Linked Data, using the Data Cube ontology.

  5. Index of Multiple Deprivation Score, 2010 - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Oct 27, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2014). Index of Multiple Deprivation Score, 2010 - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/index-of-multiple-deprivation-score-2010
    Explore at:
    Dataset updated
    Oct 27, 2014
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This dataset contains the scores underlying the Index of Multiple Deprivation, 2010. These figures combine values of many indicators into a single score that indicates the overall level of deprivation in each LSOA. A high number indicates a high level of deprivation. The English Indices of Deprivation provide a relative measure of deprivation at small area level across England. Areas are ranked from least deprived to most deprived on seven different dimensions of deprivation and an overall composite measure of multiple deprivation. Most of the data underlying the 2010 indices are for the year 2008. The indices have been constructed by the Social Disadvantage Research Centre at the University of Oxford for the Department for Communities and Local Government. All figures can only be reproduced if the source (Department for Communities and Local Government, Indices of Deprivation 2010) is fully acknowledged. The domains used in the Indices of Deprivation 2010 are: income deprivation; employment deprivation; health deprivation and disability; education deprivation; crime deprivation; barriers to housing and services deprivation; and living environment deprivation. Each of these domains has its own scores and ranks, allowing users to focus on specific aspects of deprivation. Because the indices give a relative measure, they can tell you if one area is more deprived than another but not by how much. For example, if an area has a rank of 40 it is not half as deprived as a place with a rank of 20. The Index of Multiple Deprivation was constructed by combining scores from the seven domains. When comparing areas, a higher deprivation score indicates a higher proportion of people living there who are classed as deprived. But as for ranks, deprivation scores can only tell you if one area is more deprived than another, but not by how much. This dataset was created from a spreadsheet provided by the Department of Communities and Local Government, which can be downloaded here. The method for calculating the IMD score and underlying indicators is detailed in the report 'The English Indices of Deprivation 2010: Technical Report'. The data is represented here as Linked Data, using the Data Cube ontology.

  6. f

    DataSheet1_A Regional Earth System Data Lab for Understanding Ecosystem...

    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lina M. Estupinan-Suarez; Fabian Gans; Alexander Brenning; Victor H. Gutierrez-Velez; Maria C. Londono; Daniel E. Pabon-Moreno; Germán Poveda; Markus Reichstein; Björn Reu; Carlos A. Sierra; Ulrich Weber; Miguel D. Mahecha (2023). DataSheet1_A Regional Earth System Data Lab for Understanding Ecosystem Dynamics: An Example from Tropical South America.pdf [Dataset]. http://doi.org/10.3389/feart.2021.613395.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Lina M. Estupinan-Suarez; Fabian Gans; Alexander Brenning; Victor H. Gutierrez-Velez; Maria C. Londono; Daniel E. Pabon-Moreno; Germán Poveda; Markus Reichstein; Björn Reu; Carlos A. Sierra; Ulrich Weber; Miguel D. Mahecha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Earth
    Description

    Tropical ecosystems experience particularly fast transformations largely as a consequence of land use and climate change. Consequences for ecosystem functioning and services are hard to predict and require analyzing multiple data sets simultaneously. Today, we are equipped with a wide range of spatio-temporal observation-based data streams that monitor the rapid transformations of tropical ecosystems in terms of state variables (e.g., biomass, leaf area, soil moisture) but also in terms of ecosystem processes (e.g., gross primary production, evapotranspiration, runoff). However, the underexplored joint potential of such data streams, combined with deficient access to data and processing, constrain our understanding of ecosystem functioning, despite the importance of tropical ecosystems in the regional-to-global carbon and water cycling. Our objectives are: 1. To facilitate access to regional “Analysis Ready Data Cubes” and enable efficient processing 2. To contribute to the understanding of ecosystem functioning and atmosphere-biosphere interactions. 3. To get a dynamic perspective of environmental conditions for biodiversity. To achieve our objectives, we developed a regional variant of an “Earth System Data Lab” (RegESDL) tailored to address the challenges of northern South America. The study region extensively covers natural ecosystems such as rainforest and savannas, and includes strong topographic gradients (0–6,500 masl). Currently, environmental threats such as deforestation and ecosystem degradation continue to increase. In this contribution, we show the value of the approach for characterizing ecosystem functioning through the efficient implementation of time series and dimensionality reduction analysis at pixel level. Specifically, we present an analysis of seasonality as it is manifested in multiple indicators of ecosystem primary production. We demonstrate that the RegESDL has the ability to underscore contrasting patterns of ecosystem seasonality and therefore has the potential to contribute to the characterization of ecosystem function. These results illustrate the potential of the RegESDL to explore complex land-surface processes and the need for further exploration. The paper concludes with some suggestions for developing future big-data infrastructures and its applications in the tropics.

  7. d

    Data from: Corescan© Hyperspectral Core Imager, Mark III system data...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Corescan© Hyperspectral Core Imager, Mark III system data collected for the characterization of mineral resources near Nabesna, Alaska, 2014-2016 [Dataset]. https://catalog.data.gov/dataset/corescan-hyperspectral-core-imager-mark-iii-system-data-collected-for-the-characteriz-2014
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Alaska, Nabesna
    Description

    Corescan© Hyperspectral Core Imager Mark III (HCI-III) system data were acquired for hand samples, and subsequent billets made from the hand samples, collected during the U.S. Geological Survey (USGS) 2014, 2015, and 2016 field seasons in the Nabesna area of the eastern Alaska Range. This area contains exposed porphyry deposits and hand samples were collected throughout the region in support of the HyMap imaging spectrometer survey (https://doi.org/10.5066/F7DN435W) (Kokaly and others, 2017a). The HCI-III system consists of three different components. The first is an imaging spectrometer which collects reflectance data with a spatial resolution of approximately 500 nanometers (nm) for 514 spectral channels covering the 450-2,500 nm wavelength range of the electromagnetic spectrum (Martini and others, 2017). The second is a spectrally calibrated RGB camera that collects high resolution imagery of the samples with a 50 micrometer (μm) pixel size. The third component is a three-dimensional (3D) laser profiler that measures sample texture, surface features and shape with a vertical resolution of 20 μm (Martini and others, 2017). The imaging spectrometer raw data were collected with an average bandpass of approximately 6 nm across the Short Wave Infrared (SWIR) but smoothing functions applied by Corescan during the conversion of raw data to reflectance result in a relative bandpass of approximately 13 nm in the data delivered to the USGS. Wavelength evaluations of the imaging spectrometer data revealed that the supplied wavelength values should be shifted and, thus, adjustments were made to the wavelength positions (Kokaly and others, 2017c). The wavelength and bandpass evaluation results are provided in the 'Calibration' section of this data release and were used to adjust the Corescan reflectance data. The calibrated Corescan data were combined into a reflectance data cube mosaic and are provided in the 'HyperspectralCalibrated' section. Calibrated reflectance data from Corescan were processed using the Material Identification and Characterization Algorithm (MICA), a module of the USGS PRISM (Processing Routines in IDL for Spectroscopic Measurements) software (Kokaly, 2011). MICA identifies the spectrally predominant mineral(s) in each pixel of imaging spectrometer data by comparing continuum-removed spectral features in the pixel’s reflectance spectrum to continuum-removed absorption features in reference spectra of minerals and other materials. For each pixel, the reference spectrum with the highest fit value identifies the predominant mineral class. White mica wavelength position was computed for each pixel with spectrally predominant muscovite or illite. The computation was made using a function of the USGS PRISM software (Kokaly, 2011). The white mica wavelength values were output as a classification image, with classes in 1 nm increments. A total of 63 hand samples and four billets were analyzed using the HCI-III system in three scans. An index map of the samples was generated for each scan. DATA RELEASE ORGANIZATION The data are organized by analysis and data types with a brief description here and more detail within the metadata. /Calibration --Results of wavelength position and bandpass analysis. File formats: .csv, .jpg. /Hyperspectral --Corescan hyperspectral reflectance data cubes with each scan as a separate image. The Corescan naming convention is project number, project name, tray number, date of scan, internal processing record number, row number within the tray, data type and the file type, for example, .bin. File formats: .procSpecRefl.bin, .ers, .hdr. /HyperspectralCalibrated --Calibrated hyperspectral reflectance and hyperspectral mosaic reflectance data cube (.dat) with header file (.hdr). The individual samples are identified in image indexes (.jpg) of the Corescan scans. File formats: .dat, .hdr. /LaserProfiler --Corescan laser profile data. The Corescan naming convention is project number, project name, tray number, date of scan, internal processing record number, row number within the tray, data type and the file type, for example, .bin. File formats: .procProf3d.bin, .ers, .hdr. /MineralPredominance --Datasets of the predominant mineral class derived from the calibrated reflectance data. File formats: .dat, .hdr, .clr, .csv, .mcf, .tif, and .tfw. /RGBImage --Corescan three-band (RGB) color photography collected concurrently with the hyperspectral and 3D laser profiles. The Corescan naming convention is project number, project name, tray number, date of scan, internal processing record number, row number within the tray, data type and the file type, for example, .bin. File formats: .procRgbImage.bin, .ers, .hdr. /WhiteMicaWavelength --Datasets of the wavelength position of the white mica 2,200 nm Al-OH absorption feature. File formats: .dat, .hdr, .clr, .csv, .tif, and .tfw.

  8. r

    MCCN Case Study 4 - Validating gridded data products

    • researchdata.edu.au
    • adelaide.figshare.com
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja (2025). MCCN Case Study 4 - Validating gridded data products [Dataset]. http://doi.org/10.25909/29176553.V1
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.

    The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 4.ipynb)

    Research Activity Identifier (RAiD)

    RAiD: https://doi.org/10.26292/8679d473

    Case Studies

    This repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.

    Case Study 4 - Validating gridded data products

    Description

    Compare Bureau of Meteorology gridded daily maximum and minimum temperature data with data from weather stations across Western Australia.

    This is an example of comparing high-quality ground-based data from multiple sites with a data product from satellite imagery or data modelling, so that I can assess its precision and accuracy for estimating the same variables at other sites.

    Data Sources

    The case study uses national weather data products from the Bureau of Meteorology for daily mean maximum/minimum temperature, accessible from http://www.bom.gov.au/jsp/awap/temp/index.jsp. Seven daily maximum and minimum temperature grids were downloaded for the dates 7 to 13 April 2025 inclusive. These data can be accessed in the source_data folder in the downloaded ASCII grid format (*.grid). These data will be loaded into the data cube as WGS84 Geotiff files. To avoid extra dependencies in this notebook, the data have already been converted using QGIS Desktop and are also included in the source_data folder (*.tiff).

    Comparison data for maximum and minimum air temperature were downloaded for all public weather stations in Western Australia from https://weather.agric.wa.gov.au/ for the 10 day period 4 to 13 April 2025. These are included in source_data as CSV files. These downloads do not include the coordinates for the weather stations. These were downloaded via the https://api.agric.wa.gov.au/v2/weather/openapi/#/Stations/getStations API method and are included in source_data as DPIRD_weather_stations.json.

    Dependencies

    • This notebook requires Python 3.10 or higher
    • Install relevant Python libraries with: pip install mccn-engine rocrate
    • Installing mccn-engine will install other dependencies

    Overview

    1. Convert weather station data to point measurements (longitude, latitude, date, temperature)
    2. Prepare STAC metadata records for each data source (separate records for each daily minimum and maximum layer from BOM, one for all weather station minima, and one for all weather station maxima)
    3. Load data cube
    4. Visualise cube
    5. Calculate differences between weather station values and BOM data for each station and date
    6. Identify sites with extreme differences (errors) for minimum and maximum temperature
    7. Identify sites with low differences for minimum and maximum temperature
    8. Cleanup and write results to RO-Crate

    Notes

    • Weather stations with high differences/errors are likely to have configuration of positioning issus and should not be treated as reliable.
    • Weather stations with low errors are suitable for use in local analysis.
    • The generally low difference between the measured values and the BOM products indicates the level of confidence that should be applied to use of these products for analyses where local measurements are not available.
    • In reality, at least some of these sites will have contributed to the BOM products, so the comparands are not truly independent.


  9. o

    Sport and leisure facilities

    • data.opendatascience.eu
    Updated Jan 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Sport and leisure facilities [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?type=dataset
    Explore at:
    Dataset updated
    Jan 2, 2021
    Description

    Overview: 142: Areas used for sports, leisure and recreation purposes. Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.

  10. r

    MCCN Case Study 6 - Environmental Correlates for Productivity

    • researchdata.edu.au
    Updated Nov 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja (2025). MCCN Case Study 6 - Environmental Correlates for Productivity [Dataset]. http://doi.org/10.25909/29176682.V1
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.

    The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 6.ipynb)

    Research Activity Identifier (RAiD)

    RAiD: https://doi.org/10.26292/8679d473

    Case Studies

    This repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.

    Case Study 6 - Environmental Correlates for Productivity

    Description

    Analyse relationship between different environmental drivers and plant yield. This study demonstrates: 1) Loading heterogeneous data sources into a cube, and 2) Analysis and visualisation of drivers. This study combines a suite of spatial variables at different scales across multiple sites to analyse the factors correlated with a variable of interest.

    Data Sources

    The dataset includes the Gilbert site in Queensland which has multiple standard sized plots for three years. We are using data from 2022. The source files are part pf the larger collection - Chapman, Scott and Smith, Daniel (2023). INVITA Core site UAV dataset. The University of Queensland. Data Collection. https://doi.org/10.48610/951f13c

    1. Boundary file - This is a shapefile defining the boundaries of all field plots at the Gilbert site. Each polygon represents a single plot and is associated with a unique Plot ID (e.g., 03_03_1). These plot IDs are essential for joining and aligning data across the orthomosaics and plot-level measurements.
    1. Orthomosaics - The site was imaged by UAV flights multiple times throughout the 2022 growing season, spanning from June to October. Each flight produced an orthorectified mosaic image using RGB and Multispectral (MS) sensors.
    1. Plot level measurements - Multispectral Traits: Calculated from MS sensor imagery and include indices NDVI, NDRE, SAVI and Biomass Cuts: Field-measured biomass sampled during different growth stages (used as a proxy for yield).


  11. f

    MCCN Case Study 2 - Spatial projection via modelled data

    • adelaide.figshare.com
    • researchdata.edu.au
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald Hobern; Alisha Aneja; Hoang Son Le; Lili Andres Hernandez; Rakesh David (2025). MCCN Case Study 2 - Spatial projection via modelled data [Dataset]. http://doi.org/10.25909/29176364.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Donald Hobern; Alisha Aneja; Hoang Son Le; Lili Andres Hernandez; Rakesh David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 2.ipynb)Research Activity Identifier (RAiD)RAiD: https://doi.org/10.26292/8679d473Case StudiesThis repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.Case Study 2 - Spatial projection via modelled dataDescriptionEstimate soil pH and electrical conductivity at 45 cm depth across a farm based on values collected from soil samples. This study demonstrates: 1) Description of spatial assets using STAC, 2) Loading heterogeneous data sources into a cube, 3) Spatial projection in xarray using different algorithms offered by the pykrige and rioxarray packages.Data sourcesBradGinns_SOIL2004_SoilData.csv - Soil measurements from the University of Sydney Llara Campey farm site from 2004, corresponding to sites L1, L3 and L4 describing mid-depth, soil apparent electrical conductivity (ECa), GammaK, Clay, Silt, Sand, pH and soil electrical conductivity (EC)Llara_Campey_field_boundaries_poly.shp - Field boundary shapes for the University of Sydney Llara Campey farm siteDependenciesThis notebook requires Python 3.10 or higherInstall relevant Python libraries with: pip install mccn-engine rocrate rioxarray pykrigeInstalling mccn-engine will install other dependenciesOverviewSelect soil sample measurements for pH or EC at 45 cm depthSplit sample measurements into 80% subset to model interpolated layers and 20% to test interpolated layersGenerate STAC metadata for layersLoad data cubeInterpolate pH and EC across site using the 80% subset and three different 2D interpolation methods from rioxarray (nearest, linear and cubic) and one from pykrige (linear)Calculate the error between each layer of interpolated values and measured values for the 20% setaside for testingCompare the mean and standard deviation of the errors for each interpolation methodClean up and package results as RO-CrateNotesThe granularity of variability in soil data significantly compromises all methodsDepending on the 80/20 split, different methods may appear more reliable, but the pykrige linear method is most often best

  12. Supplementary data for "Investigating seasonal velocity variations of...

    • zenodo.org
    bin, nc
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesca Baldacchino; Francesca Baldacchino; Whyjay Zheng; Whyjay Zheng; Kunpeng Wu; Kunpeng Wu; Vassiliy Kapitsa; Vassiliy Kapitsa; Alexandr Yegorov; Alexandr Yegorov; Tobias Bolch; Tobias Bolch (2025). Supplementary data for "Investigating seasonal velocity variations of selected glaciers in High Mountain Asia" [Dataset]. http://doi.org/10.5281/zenodo.15366525
    Explore at:
    nc, binAvailable download formats
    Dataset updated
    May 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Francesca Baldacchino; Francesca Baldacchino; Whyjay Zheng; Whyjay Zheng; Kunpeng Wu; Kunpeng Wu; Vassiliy Kapitsa; Vassiliy Kapitsa; Alexandr Yegorov; Alexandr Yegorov; Tobias Bolch; Tobias Bolch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    High-mountain Asia
    Description

    This submission contains the glacier speed time series and other relevant data for the analysis done in the study "Investigating seasonal velocity variations of selected glaciers in High Mountain Asia."

    Dataset Correspondence to: Whyjay Zheng (whyjz@csrsr.ncu.edu.tw)

    Data set content

    • {Name}_Global-ts-results*.nc: Data cube in NetCDF (.nc) format containing a time series of glacier speed maps for the selected single glacier {Name}. There are five glaciers analyzed in this study: Abramov, Khumbu, Petrov, Tuyusku, and Yanong.
    • {Name}_global-ts-refgeo.gpkg: The footprint of the glacier speed maps for the selected single glacier {Name}. In geopackage format.
    • {Name}_flowline_densified200m.gpkg: Selected flowine with vertex interval ~= 200 m for the selected single glacier {Name}. In geopackage format.
    • analysis_template.ipynb: Jupyter Notebook file containing template code for users to perform and reproduce basis analysis with the data cubes.
    • Readme.md: This readme file.

    {Name}_Global-ts-results*.nc file structure

    The data cube has a spatial spacing of 120 m (for x and y) and a temporal spacing of 6 days (for time).

    Coordinates:

    • time: datetime64 object specifying the temporal sampling points.
    • x: Easting in a specified CRS. The other two geopackage files use the same CRS as the .nc file.
    • y: Northing in a specified CRS.

    Data Variables:

    • speed: Glacier speed in m/day.
    • speed_uncertainty: Uncertainty (1-sigma) of glacier speed in m/day.
    • data_gap_days: Temporal duration between the sampling location and the nearest measurement, in days. For example, if data_gap_days is 3 on 2020-01-15, then the nearest speed measurement is either from 2020-01-12 or 2020-01-18.

    How to cite

    We recommend that users cite the original article (Baldacchinoa et al.) and provide a link to the dataset landing page.

  13. a

    Visualize A Space Time Cube in 3D

    • gemelo-digital-en-arcgis-gemelodigital.hub.arcgis.com
    Updated Dec 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Society for Conservation GIS (2020). Visualize A Space Time Cube in 3D [Dataset]. https://gemelo-digital-en-arcgis-gemelodigital.hub.arcgis.com/maps/acddde8dae114381889b436fa0ff4b2f
    Explore at:
    Dataset updated
    Dec 3, 2020
    Dataset authored and provided by
    Society for Conservation GIS
    Description

    Stamp Out COVID-19An apple a day keeps the doctor away.Linda Angulo LopezDecember 3, 2020https://theconversation.com/coronavirus-where-do-new-viruses-come-from-136105SNAP Participation Rates, was explored and analysed on ArcGIS Pro, the results of which can help decision makers set up further SNAP-D initiatives.In the USA foods are stored in every State and U.S. territory and may be used by state agencies or local disaster relief organizations to provide food to shelters or people who are in need.US Food Stamp Program has been ExtendedThe Supplemental Nutrition Assistance Program, SNAP, is a State Organized Food Stamp Program in the USA and was put in place to help individuals and families during this exceptional time. State agencies may request to operate a Disaster Supplemental Nutrition Assistance Program (D-SNAP) .D-SNAP Interactive DashboardAlmost all States have set up Food Relief Programs, in response to COVID-19.Scroll Down to Learn more about the SNAP Participation Analysis & ResultsSNAP Participation AnalysisInitial results of yearly participation rates to geography show statistically significant trends, to get acquainted with the results, explore the following 3D Time Cube Map:Visualize A Space Time Cube in 3Dhttps://arcg.is/1q8LLPnetCDF ResultsWORKFLOW: a space-time cube was generated as a netCDF structure with the ArcGIS Pro Space-Time Mining Tool : Create a Space Time Cube from Defined Locations, other tools were then used to incorporate the spatial and temporal aspects of the SNAP County Participation Rate Feature to reveal and render statistically significant trends about Nutrition Assistance in the USA.Hot Spot Analysis Explore the results in 2D or 3D.2D Hot Spotshttps://arcg.is/1Pu5WH02D Hot Spot ResultsWORKFLOW: Hot Spot Analysis, with the Hot Spot Analysis Tool shows that there are various trends across the USA for instance the Southeastern States have a mixture of consecutive, intensifying, and oscillating hot spots.3D Hot Spotshttps://arcg.is/1b41T43D Hot Spot ResultsThese trends over time are expanded in the above 3D Map, by inspecting the stacked columns you can see the trends over time which give result to the overall Hot Spot Results.Not all counties have significant trends, symbolized as Never Significant in the Space Time Cubes.Space-Time Pattern Mining AnalysisThe North-central areas of the USA, have mostly diminishing cold spots.2D Space-Time Mininghttps://arcg.is/1PKPj02D Space Time Mining ResultsWORKFLOW: Analysis, with the Emerging Hot Spot Analysis Tool shows that there are various trends across the USA for instance the South-Eastern States have a mixture of consecutive, intensifying, and oscillating hot spots.Results ShowThe USA has counties with persistent malnourished populations, they depend on Food Aide.3D Space-Time Mininghttps://arcg.is/01fTWf3D Space Time Mining ResultsIn addition to obvious planning for consistent Hot-Hot Spot Areas, areas oscillating Hot-Cold and/or Cold-Hot Spots can be identified for further analysis to mitigate the upward trend in food insecurity in the USA, since 2009 which has become even worse since the outbreak of the COVID-19 pandemic.After Notes:(i) The Johns Hopkins University has an Interactive Dashboard of the Evolution of the COVID-19 Pandemic.Coronavirus COVID-19 (2019-nCoV)(ii) Since March 2020 in a Response to COVID-19, SNAP has had to extend its benefits to help people in need. The Food Relief is coordinated within States and by local and voluntary organizations to provide nutrition assistance to those most affected by a disaster or emergency.Visit SNAPs Interactive DashboardFood Relief has been extended, reach out to your state SNAP office, if you are in need.(iii) Follow these Steps to build an ArcGIS Pro StoryMap:Step 1: [Get Data][Open An ArcGIS Pro Project][Run a Hot Spot Analysis][Review analysis parameters][Interpret the results][Run an Outlier Analysis][Interpret the results]Step 2: [Open the Space-Time Pattern Mining 2 Map][Create a space-time cube][Visualize a space-time cube in 2D][Visualize a space-time cube in 3D][Run a Local Outlier Analysis][Visualize a Local Outlier Analysis in 3DStep 3: [Communicate Analysis][Identify your Audience & Takeaways][Create an Outline][Find Images][Prepare Maps & Scenes][Create a New Story][Add Story Elements][Add Maps & Scenes] [Review the Story][Publish & Share]A submission for the Esri MOOCSpatial Data Science: The New Frontier in AnalyticsLinda Angulo LopezLauren Bennett . Shannon Kalisky . Flora Vale . Alberto Nieto . Atma Mani . Kevin Johnston . Orhun Aydin . Ankita Bakshi . Vinay Viswambharan . Jennifer Bell & Nick Giner

  14. Z

    Forestry and Biodiversity monitoring in Lithuania with hyperspectral camera...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ART21 (2023). Forestry and Biodiversity monitoring in Lithuania with hyperspectral camera and UAV [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8333788
    Explore at:
    Dataset updated
    Sep 14, 2023
    Dataset authored and provided by
    ART21
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Lithuania
    Description

    Acquisition dates: to be updated.

    Location: Scots pine and mixed forest in Lithuania

    Camera data:

    Spectral Range 400 – 1000 nm

    Spectral sampling 2.68 nm

    Spectral resolution 5.5 nm

    Fore lens focal length 15 mm

    Field of view 38 deg

    Spectral bands 224

    Spatial pixels 1024

    Flight altitude: 70 m

    Spatial resolution: 0.05 m/pixel

    The dataset consists of pine tree forest hyperspectral imaging data acquired with a UAV on several dates.

    The data from each UAV flight are given as a separate dataset.

    Each dataset consists of raw and processed hyperspectral imaging data. The raw data include calibration images of white reference and dark background, raw hyperspectral images, and information on the UAV flight path.

    TheSPECIM CaliGeoPRO software was used to process raw images into hyperspectral data cubes, which are provided in the format ENVI standard.

    Each flight data will come as a separate hyperlink to the storage.

    zip file structure (folders): calibration - holds the radiometric calibration ENVI type file (raster of size 1x1024) capture - raw camera capture data, navigation files, log file. metdata, results - config and empty folder out - holds generated ENVI data cube raster file.

    Download:

    https://icaerus-data-1.s3.eu-central-1.amazonaws.com/Uzkresti_miskai_new_fl7_20230510_151005.zip

  15. n

    Cube-DB

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jul 22, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Cube-DB [Dataset]. http://identifiers.org/RRID:SCR_013233
    Explore at:
    Dataset updated
    Jul 22, 2011
    Description

    Cube-DB is a database of pre-evaluated conservation and specialization scores for residues in paralogous proteins belonging to multi-member families of human proteins. Protein family classification follows (largely) the classification suggested by HUGO Gene Nomenclature Committee. Sets of orhtologous protein sequences were generated by mutual-best-hit strategy using full vertebrate genomes available in Ensembl. The scores, described on documentation page, are assigned to each individual residue in a protein, and presented in the form of a table (html or downloadable xls formats) and mapped, when appropriate, onto the related structure (Jmol, Pymol, Chimera).

  16. f

    The Geography of Oxia Planum 02 CASSIS Data

    • datasetcatalog.nlm.nih.gov
    • ordo.open.ac.uk
    Updated Sep 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vago, Jorge L.; Orgel, Csilla; Hauber, Ernst; Davis, Joel; Fawdon, Peter; Nass, Andrea; Sefton-Nash, Elliot; Adeli, Solmaz; Volat, Matthieu; Grindrod, Peter; Balme, Matt; Parks-Bowen, Adam; Le Deit, Laetitia; Quantin-Nataf, Cathy; Frigeri, Alessandro; Thomas, Nick; Cremonese, Gabriele; Loizeau, Damien (2021). The Geography of Oxia Planum 02 CASSIS Data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000822866
    Explore at:
    Dataset updated
    Sep 10, 2021
    Authors
    Vago, Jorge L.; Orgel, Csilla; Hauber, Ernst; Davis, Joel; Fawdon, Peter; Nass, Andrea; Sefton-Nash, Elliot; Adeli, Solmaz; Volat, Matthieu; Grindrod, Peter; Balme, Matt; Parks-Bowen, Adam; Le Deit, Laetitia; Quantin-Nataf, Cathy; Frigeri, Alessandro; Thomas, Nick; Cremonese, Gabriele; Loizeau, Damien
    Description

    This data set consists of the CaSSIS RGB data products acquired over the ExoMars Rover landing site in Oxia Planum. These data sets have been georeferenced and registered to the basemap mosaic as part of the scientific categorization of the landing site are being made available to scientific community to further the scientific investigation of the landing site. Additionally, data for all CASSIS data cubes in the Oxia Planum region (as of 18/8/2021) will be available through the European Space Agency Guest Storage Facility (https://www.cosmos.esa.int/web/psa/psa_gsf). Each directory contains the data cube synthetic RGB image outputs from the ISIS pipeline at the University of Bern. This data set has not been georeferenced to the Oxia Planum basemap but each directory contains supporting map projection information.ContentsThis data set contains 3 directories:02_a A synthetic RGB mosaic of CASSIS data. These data are provided with an equirectangular projection centred at 335.45°E02_b Individual synthetic RGB images that where georeferenced to the OxiaBase map and used in mosaic (02_a). These data are provided with an equirectangular projection centered at 335.45°E 02_c 6 CASSIS Data cubes that were used in the scientific reconciliation of the Oxia Planum HiRISE mapping project.Additionally all the data cube collected at Oxia Planum will be made available through the ESA guest storage facility www.cosmos.esa.int/web/psa/psa_gsfGuide to individual files 02_a_CASSIS_georeferenced_sRGB_mosaicThe mosaic is named: OP = Oxia Planum, sRGB = synthetic RGB, Mosaic = indicates that this is a mosaiced data set, 4m = pixel size, April2021 = the date when the mosaic was made. We may make updates as more CASSIS data becomes available.File name Description CASSIS_OP_sRGB_mosaic_4m_april2021.tfw World file CASSIS_OP_sRGB_mosaic_4m_april2021.tif Image data <- Open this data in GiS with the other supporting files in the same directoryCASSIS_OP_sRGB_mosaic_4m_april2021.tif.aux.xml Auxiliary symbology statistics CASSIS_OP_sRGB_mosaic_4m_april2021.tif.ovr Image overviews 02_b_CASSIS_georeferenced_sRGBNaming convention: Each CASSIS image is named “MYww_xxxxxx_yyy_z” where; MYww = Mars year (e.g., MY35), xxxxxx = orbit number (e.g., 009394), yyy = distance around the orbit from 0 – 359, z = image mode (0 = single acquisition, 1 = stereo pair image 1, 2 = stereo pair image 2), RGB = synthetic RGB data, _r = georeferenced to the base layer and rectified.File name (example) Description MY34_001934_162_0_RGB_r.TFw World file MY34_001934_162_0_RGB_r.TIF Image data <-Open this data in GiS with the other supporting files in the same directory MY34_001934_162_0_RGB_r.TIF.aux.xml Auxiliary symbology statistics MY34_001934_162_0_RGB_r.TIF.ovr Image overviews These data are provided with the following projection: Equirectangular_Mars_Oxia_Planum, Projections = Equidistant_Cylindrical, Datum = D_Mars_2000 Spheroid, Central meridian = 335.45CASSIS DataThe CaSSIS RGB products of this data set are standard products created by the University of Bern and University of Arizona available to the CASSIS science team. The Colour and Stereo Surface Imaging System (CaSSIS; Thomas et al., 2017) instrument on the ESA Trace Gas Orbiter (TGO) continues to observe the ExoMars landing site (Figure 1). CaSSIS collects data with four filters (Infrared (IR); 950 nm, Near-Infrared (NIR); 850 nm, broad transmission, Panchromatic, filter (PAN); 650 nm and BLUE-GREEN; 475 nm), chosen to provide the camera with a limited multispectral capability sensitive to a variety of minerals (Tornabene et al., 2017). CaSSIS has a swath width of ~9 km and a rotation mechanism to permit stereo acquisitions. We use CaSSIS 3 or 4 band cubes for our scientific investigation of Oxia Planum. A mosaic of synthetic RGB products is presented on the main map. Synthetic RGB products use a combination of PAN and BLUE filter images whereby: The Red channel is the PAN filter mosaic, The Green channel is a combination of a low pass filter of the Blue and a high pass filter of the PAN, incorporating colour information from BLUE and spatial information of PAN. The Blue channel is a combination of PAN and BLUE-GREEN such that each pixel has a value of (2*BLU – 0.3*PAN). Each channel is individually contrast-enhanced to form the final product. As TGO operates in a non-sun synchronous orbit, surface overflights repeat every 36 days spanning a range of local times and seasons (table 2), individual images do not necessarily have appropriate viewing angles, lighting and atmospheric conditions conducive to the creation of a consistent mosaic data set. We will continue to update the database of georeferenced images as more appropriate images are collected by TGO.GeoreferencingGeoreferencing and registration of the CaSSIS RGB data used an initial set of manual tie points to seed the automatically generating additional points. The CTX mosaic and CaSSIS data where rectified using the spline transformation. which optimizes for local accuracy but not global accuracy (Esri, 2020). This method provided good results for images with a range of viewing angles and accounts well for local adjustments needed for abrupt elevation changes.AcknowledgmentThe authors wish to thank the CaSSIS spacecraft and instrument engineering teams. CaSSIS is a project of the University of Bern and funded through the Swiss Space Office via ESA's PRODEX programme. The instrument hardware development was also supported by the Italian Space Agency (ASI) (ASI-INAF agreement no. I/2020-17-HH.0), INAF/Astronomical Observatory of Padova, and the Space Research Center (CBK) in Warsaw. Support from SGF (Budapest), the University of Arizona (Lunar and Planetary Lab.) and NASA are also gratefully acknowledged. Operations support from the UK Space Agency under grant ST/R003025/1 is also acknowledged.

  17. Hyperspectral Images from Neolithic Stone Age Rock Paintings: Halsvuori...

    • zenodo.org
    • data.niaid.nih.gov
    png, zip
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna-Maria Raita-Hakola; Anna-Maria Raita-Hakola; Ilkka Pölönen; Ilkka Pölönen; Samuli Rahkonen; Samuli Rahkonen (2024). Hyperspectral Images from Neolithic Stone Age Rock Paintings: Halsvuori Finland [Dataset]. http://doi.org/10.5281/zenodo.12544151
    Explore at:
    zip, pngAvailable download formats
    Dataset updated
    Sep 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anna-Maria Raita-Hakola; Anna-Maria Raita-Hakola; Ilkka Pölönen; Ilkka Pölönen; Samuli Rahkonen; Samuli Rahkonen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2019
    Area covered
    Finland
    Description

    This data release contains 7 hyperspectral images from Neolithic stone age rock paintings. Each HS image is in envi format, stored into a .zip file, containing capture, metadata and results folders. Capture folder has white and dark currents and spectral raw data. Metadata contains metainformation and results folder contains the computed reflectance images. In the root of every folder, there is a .png image, captured simuntaneously with spectral data. For preview, the png:s are also uploaded separately. The white rectangle visualises the area the spectral cube is recorded. The original ground truth of these images is written in the below mentioned text, and there are images of all of the mentioned figures. A spectral analysis may reveal hidden features and give a basis for new interpretations such is in out IEEE Igars article below.

    HS images were captured using Specim IQ imager in August 2019 in the town of Jyväskylä, located in the middle of Finland. The name of the rock painting site is Halsvuori. According to literature, the Halsvuori has six figures: Two humans in front poses with raised hands. One human has bent legs; the other one’s legs are faded. Both human figures hold small animals in one of their hands; the animal figures represent small game animals [1]. Two stains, probably hand shapes, are one and a half meters away from the humans. All figures are described as small, and the visibility varies; some are recognisable and almost intact, and some are worn, fragmented or scattered areas of paint [2, 1]. The authenticity of the hand stains being as old as the human figures is not clear [3]. Halsvuori paintings are hidden in the woods, near a small pond [3].

    IMPORTANT! WHILE THIS DATASET IS OPEN TO USE, please cite also to to our original article:

    Raita-Hakola, A.-M., Rahkonen, S., Pölönen, I. (2024, July). Revealing hidden art: Authenticating and unveiling neolithic rock paintings through advanced hyperspectral imaging techniques. In IGARSS 2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE.
    The code of the analysis: Pölönen, I., & Raita-Hakola, A.-M. (2024). Hyperspectral Images from Neolithic Stone Age Rock Paintings: Halsvuori Finland - Source Code. Zenodo. https://doi.org/10.5281/zenodo.10844005. Most of the included reference literature is written in Finnish, the authors may if needed help with it.
  18. Tools for descriptive statistical and visual analyses of hierarchical...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumana Kalyanasundaram; Yohan Lefol; Sveinung Gundersen; Torbjørn Rognes; Lene Alsøe; Hilde Loge Nilsen; Eivind Hovig; Geir Kjetil Sandve; Diana Domanska (2023). Tools for descriptive statistical and visual analyses of hierarchical genomic tracks. [Dataset]. http://doi.org/10.1371/journal.pone.0286330.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sumana Kalyanasundaram; Yohan Lefol; Sveinung Gundersen; Torbjørn Rognes; Lene Alsøe; Hilde Loge Nilsen; Eivind Hovig; Geir Kjetil Sandve; Diana Domanska
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tools for descriptive statistical and visual analyses of hierarchical genomic tracks.

  19. MUSE HUDF survey I, Section 4: data and reproduction pipeline for photometry...

    • zenodo.org
    • data.europa.eu
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Akhlaghi; Mohammad Akhlaghi; Roland Bacon; Roland Bacon (2020). MUSE HUDF survey I, Section 4: data and reproduction pipeline for photometry and astrometry [Dataset]. http://doi.org/10.5281/zenodo.1163746
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammad Akhlaghi; Mohammad Akhlaghi; Roland Bacon; Roland Bacon
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Necessary data and Reproduction pipeline for Section 4 of "The MUSE Hubble Ultra Deep Field Survey: I. Survey description, data reduction and source detection", Bacon et al. (2017), Astronomy & Astrophysics, 608, A1. The purpose of this section in the paper is to show the photometric and astrometric precision of the processed MUSE 3D data cubes discussed in the paper (pseudo-broad-band images created from the cubes) in comparison with broad-band images of the Hubble Space Telescope (HST).

    This repository on Zenodo contains all the necessary input data, software and reproduction pipeline (containing the scripts, configuration files and settings to exactly reproduce the results in Section 4 of the paper). Below is a description of the contents:

    • gnuastro-0.2.51-bc56.tar.gz: The version of GNU Astronomy Utilities (Gnuastro) that is necessary for this pipeline. Gnuastro is a large collection of programs for astronomical data analysis on the command-line (and in scripts). Note that the reproduction pipeline only works with Gnuastro version 0.2.51, it will complain and abort if another version is installed.

      IMPORTANT NOTE: Since version 0.2.51 of Gnuastro was released, CFITSIO (one of Gnuastro's dependencies) has added a dependency for the cURL library (to read https URLs). Therefore, to install Gnuastro 0.2.51, please install CFITSIO version 3.41 or earlier.

    • gnuastro-dependencies.tar.gz: Software libraries necessary to build Gnuastro as it is used here. With these, a working C compiler is enough (currently only tested in a GNU/Linux environment) to exactly reproduce the results (tables).

    • hst-acs-images.tar.gz: Necessary images from HST's eXtreme Deep Field survey archives. These images are not necessary to run the reproduction pipeline (they will be downloaded from the HST archives if not present). They are stored here for the self-sufficiency of this repository and faster download: in this lossless compressed format, they are roughly 1/3rd the volume of the same files in HST archives.

    • hst-acs-throughputs.tar.gz: The throughputs of HST Advanced Camera for Surveys (ACS) filters necessary in this study. These are also available from the HST archives and are kept here with similar reasons to above.

    • muse-pseudo-broadband-images.tar.gz: Pseudo-broad-band images generated from the MUSE 3D data cube. These images are only released in this repository. However, to run the reproduction pipeline, it isn't necessary to download them directly from here. The script will download them from Zenodo automatically.

    • reproduce-v1-4-gaafdb04.tar.gz: The reproduction pipeline (version 1-4-gaafdb04) that produces the results (tables) plotted in the paper. The full Git version controlled history of this repository is available on git-cral.univ-lyon1.fr or gitlab.com. We recommend cloning from the Git repository if it is available. This tarball is kept here in case those servers don't work or Git is no longer in common use. Please see the README file in this repository for instructions on how to run the reproduction pipeline and exactly reproduce the results. This pipeline will download all the necessary data if they aren't already present on the system (it is probably just necessary to install the required version of Gnuastro).

    The Creative Commons Attribution-NonCommercial 4.0 copyright mentioned in the Zenodo webpage is only applicable to files that don't have an explicit copyright within them. The copyright of other files (mainly scripts and software) is mentioned within them (all are free licenses).

    For any issues with the pipeline/processing, please contact Mohammad Akhlaghi.

  20. b

    Cube db

    • bioregistry.io
    Updated Jul 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Cube db [Dataset]. https://bioregistry.io/cubedb
    Explore at:
    Dataset updated
    Jul 9, 2022
    Description

    Cube-DB is a database of pre-evaluated results for detection of functional divergence in human/vertebrate protein families. It analyzes comparable taxonomical samples for all paralogues under consideration, storing functional specialisation at the level of residues. The data are presented as a table of per-residue scores, and mapped onto related structures where available.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nasa.gov (2025). KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/keyword-search-in-text-cube-finding-top-k-relevant-cells
Organization logo

KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS - Dataset - NASA Open Data Portal

Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description

KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS BOLIN DING, YINTAO YU, BO ZHAO, CINDY XIDE LIN, JIAWEI HAN, AND CHENGXIANG ZHAI Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

Search
Clear search
Close search
Google apps
Main menu