Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows how the Eurostat data cube in the orginal publicatin is modelled in QB4OLAP.
This data is based on statistical data about asylum applications to the European Union, provided by Eurostat on
http://ec.europa.eu/eurostat/web/products-datasets/-/migr_asyappctzm
Further data has been integrated from: https://github.com/lorenae/qb4olap/tree/master/examples
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data, models and codes used in the paper "Towards an AI Cube: Enriching Geospatial Data Cube with AI Inference Capabilities"
Facebook
TwitterWorkforce Information Cubes for NASA, sourced from NASA's personnel/payroll system, gives data about who is working where and on what. Includes records for every civil service employee in NASA, snapshots of workforce composition as of certain dates, and data on personnel transactions, such as hires, losses and promotions. Updates occur every 2 weeks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 30m seamless annual leaf-on Landsat composites from 1985 to 2024 were generated using a comprehensive framework designed to ensure high-quality, consistent data across decades. Starting with preprocessed Level-2 surface reflectance images from multiple Landsat sensors, the dataset is restricted to the Leaf-On season, with rigorous cloud and shadow masking applied based on quality assessment bands. To maintain consistency across sensors, spectral harmonization is conducted, followed by annual composite generation using the medoid method to capture peak vegetation conditions. The resulting composites are structured into a spatially consistent data cube, facilitating efficient analysis and monitoring of vegetation dynamics over time.
The band naming convention follows Landsat TM standards, with bands designated as Blue (B1), Green (B2), Red (B3), NIR (B4), SWIR1 (B5), and SWIR2 (B7). Both qualitative and quantitative evaluations were conducted to validate the data quality. Here, we provide 2023 image data covering southwestern forest regions of China as a sample for testing. For access to the full dataset, please visit Google Earth Engine at this link, and Earth Engine App (Landsat Yearly Composite Viewer) at this link.
The dataset has now been updated to include data up to 2024.
Data citation: Cai, Y., Li, X., Zhu, P., Nie, S., Wang, C., Liu, X., & Chen, Y. (2025). China Earth Observation Data Cube: The 30m Seamless Annual Leaf-On Landsat Composites from 1985 to 2023. Journal of Remote Sensing. DOI: 10.34133/remotesensing.0698
For data-related inquiries, please contact Dr. Yaotong Cai at caiyt33@mail2.sysu.edu.cn.
Facebook
TwitterEarth Observation Data Cube generated from Landsat Level-2 product over Brazil extension. This dataset is provided in Cloud Optimized GeoTIFF (COG) file format. The dataset is processed with 30 meters of spatial resolution, reprojected and cropped to BDC_MD grid Version 2 (BDC_MD V2), considering a temporal compositing function of 16 days using the Least Cloud Cover First (LCF) best pixel approach.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SeasFire Cube is a scientific datacube for seasonal fire forecasting around the globe. Apart from seasonal fire forecasting, which is the aim of the SeasFire project, the datacube can be used for several other tasks. For example, it can be used to model teleconnections and memory effects in the earth system. Additionally, it can be used to model emissions from wildfires and the evolution of wildfire regimes.
It has been created in the context of the SeasFire project, which deals with "Earth System Deep Learning for Seasonal Fire Forecasting" and is funded by the European Space Agency (ESA) in the context of ESA Future EO-1 Science for Society Call.
It contains 21 years of data (2001-2021) in an 8-days time resolution and 0.25 degrees grid resolution. It has a diverse range of seasonal fire drivers. It expands from atmospheric and climatological ones to vegetation variables, socioeconomic and the target variables related to wildfires such as burned areas, fire radiative power, and wildfire-related CO2 emissions.
|
Feature |
Value |
|---|---|
|
Spatial Coverage |
Global |
|
Temporal Coverage |
2001 to 2021 |
|
Spatial Resolution |
0.25 deg x 0.25 deg |
|
Temporal Resolution |
8 days |
|
Number of Variables |
54 |
|
Tutorial Link |
Facebook
TwitterPrevious studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.
Facebook
TwitterEarth Observation Data Cube generated from CBERS-4/WFI and CBERS-4A/WFI Level-4 SR products over Brazil extension. This dataset is provided in Cloud Optimized GeoTIFF (COG) file format. The dataset is processed with 64 meters of spatial resolution, reprojected and cropped to BDC_LG grid Version 2 (BDC_LG V2), considering a temporal compositing function of 8 days using the Least Cloud Cover First (LCF) best pixel approach.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel file contains crosswalks among different metadata schemas that can be used for the description of data cubes in the areas of Marine Science, Earth Sciences and Climate Research. These data cubes common contain observations of some variables in some feature of interest, taken by Earth Observation systems (e.g., satellites) or as in-situ observations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Earth Observation Data Cube generated from CBERS-4/MUX Level-4 SR product over Brazil extension. This dataset is provided in Cloud Optimized GeoTIFF (COG) file format. The dataset is processed with 20 meters of spatial resolution, reprojected and cropped to BDC_MD grid Version 2 (BDC_MD V2), considering a temporal compositing function of 2 months using the Least Cloud Cover First (LCF) best pixel approach.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset accompanies the IEEE Access publication: "Segmented 3D Lung Cube Dataset and Dual-Model Framework for COVID-19 Severity Prediction" 📄 Read the Paper 📌 DOI: 10.1109/ACCESS.2024.3501234
The dataset comprises pre-processed and segmented 3D CT lung volumes derived from the publicly available STOIC Database, containing CT scans from 2,000 patients. These volumetric images were generated using a detailed 10-step pipeline involving intensity windowing, image resampling, lung extraction, and cube alignment. The dataset is curated to support deep learning-based severity prediction of COVID-19, particularly for critical outcomes such as intubation or death within one month.
allcts.npyThis NumPy file contains the segmented 3D chest CT lung cubes from 2,000 patients.
[2000, 128, 64, 128]uint8 (converted from float32 for compression)reference.csv.allmasks.npzThis compressed NumPy file contains binary segmentation masks (lungs only) for each of the 2,000 CT volumes.
[2000, 128, 64, 128]reference.csv.reference.csvThis CSV file mirrors the original metadata from the STOIC challenge dataset.
This dataset is suitable for training and evaluating 3D deep learning models for:
If you use this dataset in your research, please cite:
M. A. Khan, A. Shaukat, Z. Mustansar, and M. U. Akram, "Segmented 3D Lung Cube Dataset and Dual-Model Framework for COVID-19 Severity Prediction," IEEE Access, pp. 1–1, Jan. 2024. DOI: 10.1109/ACCESS.2024.3501234
Facebook
TwitterKEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS BOLIN DING, YINTAO YU, BO ZHAO, CINDY XIDE LIN, JIAWEI HAN, AND CHENGXIANG ZHAI Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.
Facebook
TwitterThis set of quarterly cubes provides employee population data for the new Ethnicity and Race Indicator (ERI). The numbers reflect the actual number of employees as of a specific point in time. The following workforce characteristics are available for analysis: Agency, State/Country, Age (5 year interval), Education Level, Ethnicity and Race Indicator (ERI), Length of Service (5 year interval), GS & Equivalent Grade, Occupation, Occupation Category, Pay Plan & Grade, Salary Level ($10,000 interval), STEM Occupations, Supervisory Status, Type of Appointment, Work Schedule, Work Status, Employment, Average Salary, Average Length of Service. Diversity cubes will be available for the most recent 8 quarters and the 5 previous end of fiscal year (September) files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Earth Observation (EO) has been recognised as a key data source for supporting the United Nations Sustainable Development Goals (SDGs). Advances in data availability and analytical capabilities have provided a wide range of users access to global coverage analysis-ready data (ARD). However, ARD does not provide the information required by national agencies tasked with coordinating the implementation of SDGs. Reliable, standardised, scalable mapping of land cover and its change over time and space facilitates informed decision making, providing cohesive methods for target setting and reporting of SDGs. The aim of this study was to implement a global framework for classifying land cover. The Food and Agriculture Organisation’s Land Cover Classification System (FAO LCCS) provides a global land cover taxonomy suitable to comprehensively support SDG target setting and reporting. We present a fully implemented FAO LCCS optimised for EO data; Living Earth, an open-source software package that can be readily applied using existing national EO infrastructure and satellite data. We resolve several semantic challenges of LCCS for consistent EO implementation, including modifications to environmental descriptors, inter-dependency within the modular-hierarchical framework, and increased flexibility associated with limited data availability. To ensure easy adoption of Living Earth for SDG reporting, we identified key environmental descriptors to provide resource allocation recommendations for generating routinely retrieved input parameters. Living Earth provides an optimal platform for global adoption of EO4SDGs ensuring a transparent methodology that allows monitoring to be standardised for all countries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is part of the LUIcube, a global dataset on land-use at 30 arcsecond spatial resolution. The LUIcube includes information on area, the change in NPP due to land conversions (HANPPluc), the harvested NPP (including losses, HANPPharv), and the NPP remaining in ecosystems after harvest (NPPeco) for 32 land-use classes in annual time-steps from 1992 to 2020. A detailed description of the LUIcube is available in the accompanying publication.
The layers of land-use areas are provided in square kilometers (km²) per grid cell. All NPP flows are provided in tC/yr per grid cell. Adding HANPPharv to NPPeco results in the actual NPP available before harvest (NPPact=NPPeco+HANPPharv), and adding HANPPluc to NPPact results in the potential NPP available in the hypothetical absence of land use (NPPpot=NPPact+HANPPluc) for the given land-use class. Area-intensive values (in gC/m²/yr) can be calculated by dividing the NPP flows by the area of the respective land-use class per grid cell. HANPP in % of NPPpot can be calculated by summing up HANPPharv and HANPPluc and dividing it by NPPpot. Areas and NPP flows of land-use classes can be aggregated to calculate their overall HANPP.
This Zenodo repository provides data on following land-use classes: unused productive wilderness areas (WILD-core); productive wilderness areas that are sporadically used at very low intensity (WILD-periphery); unused unproductive wilderness areas (WILD-nps); forestry areas, mainly coniferous (FO-con); forestry areas, mainly non-coniferous (FO-ncon); settlements, urban areas and infrastructure (BU-builtup)
Facebook
TwitterThis dataset was created by Sabil Shrestha
Facebook
TwitterOverview of the experiment We conducted this experiment to collect a dataset of hyperspectral data-cubes of wastewater samples, along with reference laboratory analyses of various wastewater pollutants. The goal was to train data-driven models to predict pollution levels in a sample using hyperspectral data-cubes. Therefore, for ten days, we collected samples from four wastewater treatment facilities around Melbourne, Australia. The samples come from three urban wastewater treatment facilities and one stormwater treatment facility. We conducted the sampling between 04/08/2024 and 15/08/2024. Once sampled, we analysed wastewater in the laboratory for reference physical and chemical pollutants and acquired hyperspectral images. To extend the dataset, we also created a combination of stormwater and wastewater samples for which we measured a hyperspectral data-cube and some reference pollutants. This repository also includes background information about data pre-processing and validation. Repository organization: How to use the data? The repository is organized into numbered folders. Most folders contain a readme.md file in Markdown format, explaining their contents. All data are stored in non-proprietary formats: CSV for most files, except for hyperspectral acquisitions, which are in ENVI format (compatible with Python). Raw data are kept in their original format, sometimes lacking metadata such as units or column descriptions. This information is provided in the corresponding readme.md files. Pre-processed data, however, contain consistent column names, including units. Jupyter notebooks are included to pre-process and validate the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset created by the University of Sydney, includes time series digital soil map products of soil organic carbon (SOC) between January 1990 and December 2020 for the Regional Forest Agreement regions of eastern NSW. Modelling was completed using a data cube platform incorporating machine learning space-time framework and geospatial technologies. Products provide estimates of SOC concentrations and associated trends through time. Also important covariates required to drive this spatio-temporal modelling are identified using the Recursive Feature Elimination algorithm (RFE), which including a range of predictors that vary in space, time and space and time.
Full description of the digital soil maps and methods are presented in: Moyce MC, Gray JM, Wilson BR, Jenkins BR, Young MA, Ugbaje SU, Bishop TFA, Yang X, Henderson LE, Milford HB, Tulau MJ, 2021. Determining baselines, drivers and trends of soil health and stability in New South Wales forests: NSW Forest Monitoring & Improvement Program, Final report v1.1 for NSW Natural Resources Commission by NSW Department of Planning, Industry and Environment and University of Sydney.
The metadata's data packages section includes project scripts and code, final project report and an external Cloudstor link to download the predicted SOC map products,
Facebook
TwitterThis repository contains a synthetic, temporal data set that was generated by the authors by sampling values from the Gaussian distribution. The dataset contains eight nontemporal dimensions, a temporal dimension, and a numerical measure attribute. The data set was generated according to the scheme and procedure detailed in this source paper: Kaufmann, M., Fischer, P.M., May, N., Tonder, A., Kossmann, D. (2014). TPC-BiH: A Benchmark for Bitemporal Databases. In: Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham. The data set can be used for analyzing and locating temporal trends of interest, where a temporal trend is generated by selecting the desired values of the nontemporal dimensions, and then selecting the corresponding values of the temporal dimension and the numerical measure attribute. Locating temporal trends of interest, e.g., unusual trends, is a common task in many applications and domains. It can also be o..., , , # Synthetic temporal dataset for temporal trend analysis and retrieval
https://doi.org/10.5061/dryad.q573n5trf
The data set can be used for analyzing and locating temporal trends of interest, where a temporal trend is generated by selecting the desired values of the nontemporal dimensions, and then selecting the corresponding values of the temporal dimension and the numerical measure attribute. Locating temporal trends of interest, e.g., unusual trends, is a common task in many applications and domains. It can also be of interest to understand which nontemporal dimensions are associated with the temporal trends of interest. To this end, the data set can be used for analyzing and locating temporal trends in the data cube induced by the data set, e.g., retrieving outlier temporal trends using an outlier detector.Â
We generated the synthetic temporal data set [1], which contains up to 8 nontemporal dimensions, one temporal dimension, and a nume...
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset has been collected by Edge Impulse and used extensively to design the FOMO (Faster Objects, More Objects) object detection architecture. See FOMO documentation or the announcement blog post.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1642573%2F79883abbfc2db2889457586f367002d9%2FScreenshot%202024-06-04%20at%2015.22.46.png?generation=1717508155176192&alt=media" alt="">
The dataset is composed of 70 images including: - 32 blue cubes, - 32 green cubes, - 30 red cubes - 28 yellow cubes
Download link: cubes on a conveyor belt dataset in Edge Impulse Object Detection format.
You can also retrieve this dataset from this Edge Impulse public project.
Data exported from an object detection project in the Edge Impulse Studio is exported in this format, see below to understand the format.
To import this data into a new Edge Impulse project, either use:
edge-impulse-uploader --clean --info-file info.labels
The Edge Impulse object detection acquisition format provides a simple and intuitive way to store images and associated bounding box labels. Folders containing data in this format will take the following structure:
.
├── testing
│ ├── bounding_boxes.labels
│ ├── cubes.23im33f2.jpg
│ ├── cubes.23j3rclu.jpg
│ ├── cubes.23j4jeee.jpg
│ ...
│ └── cubes.23j4k0rk.jpg
└── training
├── bounding_boxes.labels
├── blue.23ijdngd.jpg
├── combo.23ijkgsd.jpg
├── cubes.23il4pon.jpg
├── cubes.23im28tb..jpg
...
└── yellow.23ijdp4o.jpg
2 directories, 73 files
The subdirectories contain image files in JPEG or PNG format. Each image file represents a sample and is associated with its respective bounding box labels in the bounding_boxes.labels file.
The bounding_boxes.labels file in each subdirectory provides detailed information about the labeled objects and their corresponding bounding boxes. The file follows a JSON format, with the following structure:
version: Indicates the version of the label format.files: A list of objects, where each object represents an image and its associated labels.
path: The path or file name of the image.category: Indicates whether the image belongs to the training or testing set.label: Provides information about the labeled objects.type: Specifies the type of label (e.g., a single label).label: The actual label or class name of the object.metadata: Additional metadata associated with the image, such as the site where it was collected, the timestamp or any useful information.boundingBoxes: A list of objects, where each object represents a bounding box for an object within the image.label: The label or class name of the object within the bounding box.x, y: The coordinates of the top-left corner of the bounding box.width, height: The width and height of the bounding box.bounding_boxes.labels example:
{
"version": 1,
"files": [
{
"path": "cubes.23im33f2.jpg",
"category": "testing",
"label": {
"type": "label",
"label": "cubes"
},
"metadata": {
"version": "2023-1234-LAB"
},
"boundingBoxes": [
{
"label": "green",
"x": 105,
"y": 201,
"width": 91,
"height": 90
},
{
"label": "blue",
"x": 283,
"y": 233,
"width": 86,
"height": 87
}
]
},
{
"path": "cubes.23j3rclu.jpg",
"category": "testing",
"label": {
"type": "label",
"label": "cubes"
},
"metadata": {
"version": "2023-4567-PROD"
},
"boundingBoxes": [
{
"label": "red",
...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows how the Eurostat data cube in the orginal publicatin is modelled in QB4OLAP.
This data is based on statistical data about asylum applications to the European Union, provided by Eurostat on
http://ec.europa.eu/eurostat/web/products-datasets/-/migr_asyappctzm
Further data has been integrated from: https://github.com/lorenae/qb4olap/tree/master/examples