Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains data that accompanies the Cloud-native geospatial data cube workflows with open-source tools tutorial. This data pertains to tutorial 2, which demonstrates working with Sentinel-1 RTC imagery processed by Alaska Satellite Facility's Hybrid Pluggable Processing Pipeline (HyP3). Users have the option to follow the tutorial on their own machine using the entire dataset (103 scenes, 47 GB) or a subset of the dataset (5 scenes, ~ 2.2 GB). Both are contained in this record.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The emerging global trend of satellite operators producing analysis ready data combined with open source tools for managing and exploiting this data are leading to more and more countries using …Show full descriptionThe emerging global trend of satellite operators producing analysis ready data combined with open source tools for managing and exploiting this data are leading to more and more countries using Earth observation data to drive progress against key national and international development agendas. This paper provides examples from Australia, Mexico, Switzerland and Tanzania on how the Open Data Cube technology has been combined with analysis ready data to provide new insights and support better policy making across issues as diverse as water resource management through to urbanization and environmental-economic accounting.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows how the Eurostat data cube in the orginal publicatin is modelled in QB4OLAP.
This data is based on statistical data about asylum applications to the European Union, provided by Eurostat on
http://ec.europa.eu/eurostat/web/products-datasets/-/migr_asyappctzm
Further data has been integrated from: https://github.com/lorenae/qb4olap/tree/master/examples
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel file contains crosswalks among different metadata schemas that can be used for the description of data cubes in the areas of Marine Science, Earth Sciences and Climate Research. These data cubes common contain observations of some variables in some feature of interest, taken by Earth Observation systems (e.g., satellites) or as in-situ observations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here, we present the ARCO (analysis-ready and cloud-optimized) Landsat-based Spectral Indices data cube. Available at 30m resolution from 2000 to 2022, it includes multiple spectral indices and multi-tier predictors (bimonthly, annual, and long-term) for continental Europe, including Ukraine, the UK, and Turkey (excluding Svalbar). This data cube has a broad coverage of indices, each providing unique insights into different aspects, including: surface reflectance, vegetation, water, soil and crop. All data layers are cloud-masked and then gap-filled, ready for analysis, modeling, and mapping applications. Technical details:
Considering the data volume, only bimonthly data layers for the years 2000 and 2022 are uploaded. However, all annual and long-term layers are available. For the full data cube, please visit this catalog. Due to Zenodo's storage limits, the data layers are stored in different buckets. Use the identifier-navigation list below to access the bucket of your interest and download the corresponding layers.
This data cube includes 4 tiers of data, depending on the processing extend in the temporal scale:
To ensure consistency and ease of use across the data layers, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are:
Please cite this dataset using the DOI: [10.5281/zenodo.10776891], which represents all versions of this dataset. This ensures your citation remains up to date with the latest version.
If you discover a bug, artifact, or inconsistency, or if you have a question, please raise a GitHub issue!
On this landing page of the Time-series of Landsat-based Spectral Indices (EU, 30m) data cube, four long-term spectral indices trend data are stored, as Zenodo doesn't allow empty buckets. Therefore, this page serves not only as the landing page for the entire dataset but also as the bucket for the long-term trend of spectral indices.
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Data and models used in the manuscript "Towards an AI Cube: Enriching Geospatial Data Cube with AI Inference Capabilities".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SeasFire Cube is a scientific datacube for seasonal fire forecasting around the globe. Apart from seasonal fire forecasting, which is the aim of the SeasFire project, the datacube can be used for several other tasks. For example, it can be used to model teleconnections and memory effects in the earth system. Additionally, it can be used to model emissions from wildfires and the evolution of wildfire regimes.
It has been created in the context of the SeasFire project, which deals with "Earth System Deep Learning for Seasonal Fire Forecasting" and is funded by the European Space Agency (ESA) in the context of ESA Future EO-1 Science for Society Call.
It contains 21 years of data (2001-2021) in an 8-days time resolution and 0.25 degrees grid resolution. It has a diverse range of seasonal fire drivers. It expands from atmospheric and climatological ones to vegetation variables, socioeconomic and the target variables related to wildfires such as burned areas, fire radiative power, and wildfire-related CO2 emissions.
Datacube properties
Feature
Value
Spatial Coverage
Global
Temporal Coverage
2001 to 2021
Spatial Resolution
0.25 deg x 0.25 deg
Temporal Resolution
8 days
Number of Variables
54
Tutorial Link
https://github.com/SeasFire/seasfire-datacube
Full name
DataArray name
Unit
Contact *
Dataset: ERA5 Meteo Reanalysis Data
Mean sea level pressure
mslp
Pa
NOA
Total precipitation
tp
m
MPI
Relative humidity
rel_hum
%
MPI
Vapor Pressure Deficit
vpd
hPa
MPI
Sea Surface Temperature
sst
K
MPI
Skin temperature
skt
K
MPI
Wind speed at 10 meters
ws10
m*s-2
MPI
Temperature at 2 meters - Mean
t2m_mean
K
MPI
Temperature at 2 meters - Min
t2m_min
K
MPI
Temperature at 2 meters - Max
t2m_max
K
MPI
Surface net solar radiation
ssr
MJ m-2
MPI
Surface solar radiation downwards
ssrd
MJ m-2
MPI
Volumetric soil water level 1
swvl1
m3/m3
MPI
Volumetric soil water level 2
swvl2
m3/m3
MPI
Volumetric soil water level 3
swvl3
m3/m3
MPI
Volumetric soil water level 4
swvl4
m3/m3
MPI
Land-Sea mask
lsm
0-1
NOA
Dataset: Copernicus
CEMS
Drought Code Maximum
drought_code_max
unitless
NOA
Drought Code Average
drought_code_mean
unitless
NOA
Fire Weather Index Maximum
fwi_max
unitless
NOA
Fire Weather Index Average
fwi_mean
unitless
NOA
Dataset: CAMS: Global Fire Assimilation System (GFAS)
Carbon dioxide emissions from wildfires
cams_co2fire
kg/m²
NOA
Fire radiative power
cams_frpfire
W/m²
NOA
Dataset: FireCCI - European Space Agency’s Climate Change Initiative
Burned Areas from Fire Climate Change Initiative (FCCI)
fcci_ba
ha
NOA
Valid mask of FCCI burned areas
fcci_ba_valid_mask
0-1
NOA
Fraction of burnable area
fcci_fraction_of_burnable_area
%
NOA
Number of patches
fcci_number_of_patches
N
NOA
Fraction of observed area
fcci_fraction_of_observed_area
%
NOA
Dataset: Nasa MODIS MOD11C1, MOD13C1, MCD15A2
Land Surface temperature at day
lst_day
K
MPI
Leaf Area Index
lai
m²/m²
MPI
Normalized Difference Vegetation Index
ndvi
unitless
MPI
Dataset: Nasa SEDAC Gridded Population of the World (GPW), v4
Population density
pop_dens
persons per square kilometers
NOA
Dataset: Global Fire Emissions Database (GFED)
Burned Areas from GFED (large fires only)
gfed_ba
hectares (ha)
MPI
Valid mask of GFED burned areas
gfed_ba_valid_mask
0-1
NOA
GFED basis regions
gfed_region
N
NOA
Dataset: Global Wildfire Information System (GWIS)
Burned Areas from GWIS
gwis_ba
ha
NOA
Valid mask of GWIS burned areas
gwis_ba_valid_mask
0-1
NOA
Dataset: NOAA Climate Indices
Arctic Oscillation Index
oci_ao
unitless
NOA
Western Pacific Index
oci_wp
unitless
NOA
Pacific North American Index
oci_pna
unitless
NOA
North Atlantic Oscillation
oci_nao
unitless
NOA
Southern Oscillation Index
oci_soi
unitless
NOA
Global Mean Land/Ocean Temperature
oci_gmsst
unitless
NOA
Pacific Decadal Oscillation
oci_pdo
unitless
NOA
Eastern Asia/Western Russia
oci_ea
unitless
NOA
East Pacific/North Pacific Oscillation
oci_epo
unitless
NOA
Nino 3.4 Anomaly
oci_nino_34_anom
unitless
NOA
Bivariate ENSO Timeseries
oci_censo
unitless
NOA
Dataset: ESA CCI
Land Cover Class 0 - No data
lccs_class_0
%
NOA
Land Cover Class 1 - Agriculture
lccs_class_1
%
NOA
Land Cover Class 2 - Forest
lccs_class_2
%
NOA
Land Cover Class 3 - Grassland
lccs_class_3
%
NOA
Land Cover Class 4 - Wetlands
lccs_class_4
%
NOA
Land Cover Class 5 - Settlement
lccs_class_5
%
NOA
Land Cover Class 6 - Shrubland
lccs_class_6
%
NOA
Land Cover Class 7 - Sparse vegetation, bare areas, permanent snow and ice
lccs_class_7
%
NOA
Land Cover Class 8 - Water Bodies
lccs_class_8
%
NOA
Dataset: Biomes
Dataset: Calculated
Grid Area in square meters
area
m²
NOA
*The datacube specifications (temporal, spatial resolution, chunk size) have been set up by the Max Planck Institut (MPI) team. For the variables that the contact is MPI, Lazaro Alonso (lalonso bgc-jena.mpg.de) has led the efforts to collect and process them. For the variables that the contact is NOA, Ilektra Karasante (ile.karasante noa.gr) has led the efforts to collect and process them.
This dataset is meant to be used to develop models for next-day fire hazard forecasting in Greece. It contains data from 2009 to 2020 at a 1km x 1km x 1 daily grid.
Check the Jupyter notebook for an example showing how to access the dataset.
Earth Observation Data Cube generated from Landsat Level-2 product over Brazil extension. This dataset is provided in Cloud Optimized GeoTIFF (COG) file format. The dataset is processed with 30 meters of spatial resolution, reprojected and cropped to BDC_MD grid Version 2 (BDC_MD V2), considering a temporal compositing function of 16 days using the Least Cloud Cover First (LCF) best pixel approach.
Workforce Information Cubes for NASA, sourced from NASA's personnel/payroll system, gives data about who is working where and on what. Includes records for every civil service employee in NASA, snapshots of workforce composition as of certain dates, and data on personnel transactions, such as hires, losses and promotions. Updates occur every 2 weeks.
Cube++ is a novel dataset for the color constancy problem that continues on the Cube+ dataset. It includes 4890 images of different scenes under various conditions. For calculating the ground truth illumination, a calibration object with known surface colors was placed in every scene.
Earth Observation Data Cube generated from CBERS-4/WFI and CBERS-4A/WFI Level-4 SR products over Brazil extension. This dataset is provided in Cloud Optimized GeoTIFF (COG) file format. The dataset is processed with 64 meters of spatial resolution, reprojected and cropped to BDC_LG grid Version 2 (BDC_LG V2), considering a temporal compositing function of 8 days using the Least Cloud Cover First (LCF) best pixel approach.
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.
This set of fiscal year trend cubes provides access to separation data. This data set provides the number of personnel actions (Transfer-Outs and Separations from the Federal Service) that have taken place within a Fiscal Year. The scope of this data set includes all data elements used in the creation of the FedScope Separations Cube (http://www.fedscope.opm.gov/). The following workforce characteristics are available for analysis: Separation, Date, Agency, Age (5 year interval), Gender, GS & Equivalent Grade, Length of Service (5 year interval), State/Country, Occupation, Occupation Category, Pay Plan & Grade, Salary Level ($10,000 interval), Type of Appointment, Work Schedule, Count, Average Salary, and Average Length of Service. The OPM Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) is the source for all FedScope data. Data is processed on a quarterly basis (i.e. March, June, September and December).
This dataset includes 720 directional B-format RIRs, i.e. first-order Ambisonic room impulse responses, measured at 30 receiver positions with 1m spacing in an equidistant grid (4xm) with 24 hemispherical source positions each. The measurements were carried out at the IEM CUBE using Soundfield ST450 MKII microphones. The data is saved according to the SOFA convention (https://www.sofaconventions.org/mediawiki/index.php). The SOFA Matlab/Octave API is available at https://github.com/sofacoustics/API_MO.5. Unfortunately, the used SOFA convention (MultiPerspectiveAmbisonicRIR) was never integrated into the official SOFA conventions. However, it can be found at: https://github.com/jdemuynke/API_MO/tree/master/API_MO/conventions.
The dataset can be found here: https://phaidra.kug.ac.at/view/o:104435.
Contact: kaspar.mueller@cerence.com; zotter@iem.at
As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. Although online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we study a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and stores probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose two heuristic aggregations to speed up the iterative Expectation-Maximization (EM) algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experimental results show that these heuristic aggregations are much faster than the baseline method of computing each topic cube from scratch. We also discuss some potential uses of topic cube and show sample experimental results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ecosystem change maps based on the GlobES ecosystem Data Cube. The layer "GlobES_changeType_19920101-20180101_1km.tif" depicts types of changes in ecosystem dominance registered between 1992 and 2018. Each change type and their respective grid identifiers are described in "changeType_legend.csv". "GlobES_transitions_19920101-20180101_1km.tif" presents the two leading ecosystem types responsible for those changes. The first ecosystem reflects the initial condition in 1992, and the second the condition in 2018.
Earth Observation Data Cube generated from CBERS-4/MUX Level-4 SR product over Brazil extension. This dataset is provided in Cloud Optimized GeoTIFF (COG) file format. The dataset is processed with 20 meters of spatial resolution, reprojected and cropped to BDC_MD grid Version 2 (BDC_MD V2), considering a temporal compositing function of 2 months using the Least Cloud Cover First (LCF) best pixel approach.
KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS BOLIN DING, YINTAO YU, BO ZHAO, CINDY XIDE LIN, JIAWEI HAN, AND CHENGXIANG ZHAI Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Note: To visualize the data in the viewer, zoom into the area of interest. The National Air Photo Library (NAPL) of Natural Resources Canada archives over 6 million aerial photographs covering all of Canada, some of which date back to the 1920s. This collection includes Time Series of aerial orthophoto mosaics over a selection of major cities or targeted areas that allow the observation of various changes that occur over time in those selected regions. These mosaics are disseminated through the Data Cube Platform implemented by NRCan using geospatial big data management technologies. These technologies enable the rapid and efficient visualization of high-resolution geospatial data and allow for the rapid generation of dynamically derived products. The data is available as Cloud Optimized GeoTIFF (COG) for direct access and as Web Map Services (WMS) or Web Coverage Services (WCS) with a temporal dimension for consumption in Web or GIS applications. The NAPL mosaics are made from the best spatial resolution available for each time period, which means that the orthophotos composing a NAPL Time Series are not necessarily coregistrated. For this dataset, the spatial resolutions are: 100 cm for the year 1932 and 50 cm for the year 1950. The NAPL indexes and stores federal aerial photography for Canada, and maintains a comprehensive historical archive and public reference centre. The Earth Observation Data Management System (EODMS) online application allows clients to search and retrieve metadata for over 3 million out of 6 million air photos. The EODMS online application enables public and government users to search and order raw Government of Canada Earth Observation images and archived products managed by NRCan such as aerial photos and satellite imagery. To access air photos, you can visit the EODMS web site: https://eodms-sgdot.nrcan-rncan.gc.ca/index-en.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains data that accompanies the Cloud-native geospatial data cube workflows with open-source tools tutorial. This data pertains to tutorial 2, which demonstrates working with Sentinel-1 RTC imagery processed by Alaska Satellite Facility's Hybrid Pluggable Processing Pipeline (HyP3). Users have the option to follow the tutorial on their own machine using the entire dataset (103 scenes, 47 GB) or a subset of the dataset (5 scenes, ~ 2.2 GB). Both are contained in this record.