Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We share the complete aerosol optical depth dataset with high spatial (1x1km^2) and temporal (daily) resolution and the Beijing 1954 projection (https://epsg.io/2412) for mainland China (2015-2018). The original aerosol optical depth images are from Multi-Angle Implementation of Atmospheric Correction Aerosol Optical Depth (MAIAC AOD) (https://lpdaac.usgs.gov/products/mcd19a2v006/) with the similar spatiotemporal resolution and the sinusoidal projection (https://en.wikipedia.org/wiki/Sinusoidal_projection). After projection conversion, eighteen tiles of MAIAC AOD were merged to obtain a large image of AOD covering the entire area of mainland China. Due to the conditions of clouds and high surface reflectance, each original MAIAC AOD image usually has many missing values, and the average missing percentage of each AOD image may exceed 60%. Such a high percentage of missing values severely limits applicability of the original MAIAC AOD dataset product. We used the sophisticated method of full residual deep networks (Li et al, 2020, https://ieeexplore.ieee.org/document/9186306) to impute the daily missing MAIAC AOD, thus obtaining the complete (no missing values) high-resolution AOD data product covering mainland China. The covariates used in imputation included coordinates, elevation, MERRA2 coarse-resolution PBLH and AOD variables, cloud fraction, high-resolution meteorological variables (air pressure, air temperature, relative humidity and wind speed) and/or time index etc. Ground monitoring data were used to generate high-resolution meteorological variables to ensure the reliability of interpolation. Overall, our daily imputation models achieved an average training R^2 of 0.90 with a range of 0.75 to 0.97 (average RMSE: 0.075, with a range of 0.026 to 0.32) and an average test R^2 of 0.90 with a range of 0.75 to 0.97 (average RMSE: 0.075 with a range of 0.026 to 0.32). With almost no difference between training metrics and test metrics, the high test R^2 and low test RMSE show the reliability of AOD imputation. In the evaluation using the ground AOD data from the monitoring stations of the Aerosol Robot Network (AERONET) in mainland China, our method obtained a R^2 of 0.78 and RMSE of 0.27, which further illustrated the reliability of the method. This database contains four datasets: - Daily complete high-resolution AOD image dataset for mainland China from January 1, 2015 to December 31, 2018. The archived resources contain 1461 images stored in 1461 files, and 3 summary Excel files. The table “CHN_AOD_INFO.xlsx” describing the properties of the 1461 images, including projection, training R^2 and RMSE, testing R^2 and RMSE, minmum, mean, median and maximum AOD that we predicted. - The table “Model_and_Accuracy_of_Meteorological_Elements.xlsx” describing the statistics of performance metrics in interpolation of high-resolution meteorological dataset. - The table “Evaluation_Using_AERONET_AOD.xlsx” showing the evaluation result of AERONET, including R^2, RMSE, and monitoring information used in this study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present the China Impervious Surface Cover dataset for 2020 and 2022 (CISC2020, CISC2022) at 30-meter spatial resolution. The dataset covers mainland China and adjacent regions where input 2-meter satellite imagery was available. Generation involved a regionally adaptive deep learning strategy integrating high-resolution (2m) imagery, Landsat annual composites, and SRTM elevation data. High-quality training and validation samples were produced using entropy-guided stratified sampling and subsequent expert visual interpretation. The dataset is provided as Cloud Optimized GeoTIFFs (COG) using an Albers projection. Accuracy assessment using independent, expert-interpreted validation points confirms high overall accuracy (Spatially Averaged F1-score > 0.93 for the impervious surface class). This dataset offers reliable, spatially explicit impervious surface information for China, suitable for applications including urban dynamics analysis, environmental monitoring, and regional planning.
The impervious surface products are named cisc_2020.tif and cisc_2022.tif. In addition to impervious surfaces, the model's outputs for the other three classes (water bodies, vegetation, and bare land/other) are provided for user reference. However, the reliability of these auxiliary classes is not guaranteed. Pixel values represent the following classes: 0 = bare land/other), 1 = impervious surface, 2 = vegetation, and 3 = water body.
The impervious surface products cover the mainland area of China. Areas outside of China, or areas within China lacking sufficient 2-meter imagery coverage, are regarded as invalid pixels and assigned value 0. To maximize information content, the data were not clipped to coastlines or administrative boundaries. We also provide masks indicating the valid data extent for each year (2020 and 2022). These files are named valid_mask_2020.tif and valid_mask_2022.tif.
To facilitate better use of the product, we also provide a shapefile indicating the acquisition date of the 2m source data used for each location (see source_date_dissovle.zip)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset with 0.5 arcminute (~1 km) was spatially downscaled from CRU TS v4.02 based on Delta downscaling method, including monthly precipitation from 1901.1 to 2017.12. The dataset covers the main land area of China. The dataset was evaluated by 496 national weather stations across China, and the evaluation indicated that the downscaled dataset is reliable for the investigations related to climate change across China.
Another data download site is Loess plateau Scientific Data Center (http://loess.geodata.cn/). This is a Chinese website. This website publishes the updated histrorical dataset and future downscaled monthly precipitation under multiple SSP Scenarios and GCMs, with 1 km spatial resolution.
/*************/ The dataset is updated yearly. Now, the period of the dataset is from 1901.1 to 2020.12.
/*************/ The future 1km dataset from 2021-2100 is published.
The data provider recommended the below publication as the reference. Peng Shouzhang, Ding Yongxia, Liu Wenzhao, Li Zhi. 1 km monthly temperature and precipitation dataset for China from 1901 to 2017. Earth System Science Data, 2019, 11, 1931–1946, https://doi.org/10.5194/essd-11-1931-2019.
Facebook
TwitterThis study presents a comprehensive comparison of gridded datasets for the Great Salt Lake (GSL) basin, focusing on precipitation and temperature as the main inputs for hydrological balances. The evaluated gridded datasets include PRISM, DAYMET, GRIDMET, NLDAS-2, and CONUS404, with in-situ data used for assessing alignment and accuracy. Key metrics such as Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Correlation Coefficient (CC) were employed to evaluate gridded dataset performance. Spatial and temporal accuracy analyses were conducted across different GSL basin regions to understand variations in accuracy. DAYMET emerged as the leading dataset for precipitation across most metrics, demonstrating consistent performance. For temperature, GRIDMET and PRISM ranked higher, indicating better representation of temperature patterns in the GSL basin. Spatial analysis revealed variability in accuracy for both temperature and precipitation data, emphasizing the importance of selecting suitable datasets for different regions to enhance overall accuracy. The insights from this study can inform environmental forecasting and water resource management in the GSL basin, assisting researchers and decision-makers in choosing reliable gridded datasets for hydrological studies.
Facebook
TwitterPotential Applications of the Dataset:
Geospatial Information: Precise geographical coordinates for each Walgreens store, enabling accurate mapping and spatial analysis. State-wise and city-wise breakdown of store locations for a comprehensive overview.
Store Details: Store addresses, including street name, city, state, and zip code, facilitating easy identification and location-based analysis. Contact information, such as phone numbers, providing a direct link to store management.
Operational Attributes: Store opening and closing hours, aiding businesses in strategic planning and market analysis. Services and amenities are available at each location, offering insights into the diverse offerings of Walgreens stores.
Historical Data: Historical data on store openings and closures, providing a timeline perspective on Walgreens' expansion and market presence.
Demographic Insights: Demographic information of the areas surrounding each store, empowering users to understand the local customer base.
Comprehensive and Up-to-Date: Regularly updated to ensure the dataset reflects the latest information on Walgreens store locations and attributes. Detailed data quality checks and verification processes for accuracy and reliability.
The dataset is structured in a flexible format, allowing users to tailor their queries and analyses based on specific criteria and preferences.
Facebook
TwitterThis dataset is a dedicated inversion result dataset for forest aboveground biomass (AGB) in Daxing District, Beijing. Its core data source is high-resolution satellite imagery from Gaofen-7, aiming to provide accurate spatial distribution data of forest AGB for regional forest carbon stock monitoring, carbon dynamics research, and carbon storage capacity assessment. Meanwhile, it can serve as basic data support for forest ecosystem management, remote sensing inversion model validation, and related work.The dataset covers the entire forest ecosystem of Daxing District, Beijing. The AGB inversion results are presented as raster data in TIFF format, with a spatial resolution consistent with the precision of Gaofen-7 satellite imagery, which can clearly reflect the spatial heterogeneity of forest AGB in the region. These inversion results are generated based on multi-dimensional remote sensing features modeling. In the early stage, key features were extracted from the preprocessed Gaofen-7 satellite data, including texture features (calculated based on the Gray-Level Co-occurrence Matrix) that reflect forest structure, visible spectral vegetation indices that characterize vegetation growth status, and original three-band RGB spectral information, laying a solid feature foundation for accurate inversion.During the modeling process, three single machine learning algorithms—Random Forest (RF), Gradient Boosting Tree (GBT), and XGBoost—were compared, and the Stacking ensemble learning method was adopted to optimize model performance. Finally, the inversion results of the Stacking ensemble model were selected as the core content of the dataset. Verified by five-fold cross-validation, the coefficient of determination (R²) of this core result reaches 0.6229, the root mean square error (RMSE) is 57.34 Mg/ha, and the mean absolute error (MAE) is 39.99 Mg/ha. Compared with the best-performing single algorithm (XGBoost, R²=0.5852), its accuracy is improved by 6.44%, and it effectively solves the common overestimation or underestimation problems of AGB in traditional modeling. The reliability and accuracy of the data have been strictly verified, which can meet the needs of regional-scale forest AGB-related research and applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Update Notice 数据更新通知
We are pleased to announce that the GlobPOP dataset for the years 2021-2022 has undergone a comprehensive quality check and has now been updated accordingly. Following the established methodology that ensures the high precision and reliability, these latest updates allow for even more comprehensive time-series analysis. The updated GlobPOP dataset remains available in GeoTIFF format for easy integration into your existing workflows.
2021-2022 年的 GlobPOP 数据集经过全面的质量检查,现已进行相应更新。 遵循确保高精度和可靠性的原有方法,本次更新允许进行更全面的时间序列分析。 更新后的 GlobPOP 数据集仍以 GeoTIFF 格式提供,以便轻松集成到您现有的工作流中。
To reflect these updates, our interactive web application has also been refreshed. Users can now explore the updated national population time-series curves from 1990 to 2022. This can be accessed via the same link: https://globpop.shinyapps.io/GlobPOP/. Thank you for your continued support of the GlobPOP, and we hope that the updated data will further enhance your research and policy analysis endeavors.
交互式网页反映了人口最新动态,用户现在可以探索感兴趣的国家1990 年至 2022 年人口时间序列曲线,并将其与人口普查数据进行比较。感谢您对 GlobPOP 的支持,我们希望更新的数据将进一步加强您的研究和政策分析工作。
If you encounter any issues, please contact us via email at lulingliu@mail.bnu.edu.cn.
如果您遇到任何问题,请通过电子邮件联系我们。
Introduction
Continuously monitoring global population spatial dynamics is essential for implementing effective policies related to sustainable development, such as epidemiology, urban planning, and global inequality. 持续监测全球人口空间动态对于实施与可持续发展相关的有效政策至关重要,例如流行病学、城市规划和全球不平等。
Here, we present GlobPOP, a new continuous global gridded population product with a high-precision spatial resolution of 30 arcseconds from 1990 to 2022. Our data-fusion framework is based on cluster analysis and statistical learning approaches, which intends to fuse the existing five products(Global Human Settlements Layer Population (GHS-POP), Global Rural Urban Mapping Project (GRUMP), Gridded Population of the World Version 4 (GPWv4), LandScan Population datasets and WorldPop datasets to a new continuous global gridded population (GlobPOP). The temporal and spatial validation results demonstrate that the GlobPOP dataset is highly accurate. GlobPOP是一套新的连续全球网格人口产品,时间跨度为从 1990 年到 2022 年,空间分辨率为 30 弧秒。数据生产融合框架基于聚类分析和统计学习方法,旨在融合现有的五个 产品(GHS-POP、GRUMP、GPWv4、LandScan和WorldPop)。时空验证结果表明GlobPOP 数据集高度准确。
With the availability of GlobPOP dataset in both population count and population density formats, researchers and policymakers can leverage our dataset to conduct time-series analysis of population and explore the spatial patterns of population development at various scales, ranging from national to city level. 通过人口计数和人口密度格式的 GlobPOP 数据集,研究人员和政策制定者可以利用该数据集对人口进行时间序列分析,并探索不同尺度的人口发展时空模式。
Data description
The product is produced in 30 arc-seconds resolution(approximately 1km in equator) and is made available in GeoTIFF format. There are two population formats, one is the 'Count'(Population count per grid) and another is the 'Density'(Population count per square kilometer each grid)
Each GeoTIFF filename has 5 fields that are separated by an underscore "_". A filename extension follows these fields. The fields are described below with the example filename:
GlobPOP_Count_30arc_1990_I32
Field 1: GlobPOP(Global gridded population)Field 2: Pixel unit is population "Count" or population "Density"Field 3: Spatial resolution is 30 arc secondsField 4: Year "1990"Field 5: Data type is I32(Int 32) or F32(Float32)
More information
本数据相关论文已发表在Scientific Data,代码可在GitHub获取。
Please refer to the paper for detailed information:
Liu, L., Cao, X., Li, S. et al. A 31-year (1990–2020) global gridded population dataset generated by cluster analysis and statistical learning. Sci Data 11, 124 (2024). https://doi.org/10.1038/s41597-024-02913-0.
The fully reproducible codes are publicly available at GitHub: https://github.com/lulingliu/GlobPOP.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global geospatial data catalog platform market size reached USD 2.17 billion in 2024, reflecting a robust surge in demand for advanced data management solutions across multiple industries. The market is projected to expand at a CAGR of 18.4% from 2025 to 2033, with the total market value forecasted to hit USD 10.22 billion by 2033. This impressive growth is primarily fueled by the increasing adoption of spatial data analytics, integration of artificial intelligence in geospatial solutions, and the expanding need for real-time data-driven decision-making in sectors such as urban planning, environmental monitoring, and disaster management.
One of the primary growth drivers for the geospatial data catalog platform market is the exponential rise in the volume and variety of geospatial data generated by satellites, drones, IoT devices, and mobile sensors. As organizations strive to harness the power of big data, the need for robust platforms that can organize, catalog, and facilitate seamless access to spatial datasets has become paramount. These platforms not only enable efficient data discovery and retrieval but also support interoperability between disparate data sources, thereby enhancing the quality and speed of spatial analysis. The proliferation of smart city initiatives and the growing emphasis on sustainable urban development have further accelerated the adoption of geospatial data catalog solutions, as governments and private enterprises increasingly recognize the value of location intelligence for resource optimization and policy planning.
Another significant factor propelling the market is the integration of advanced analytics and artificial intelligence within geospatial data catalog platforms. Modern solutions are now equipped with machine learning algorithms that automate metadata tagging, anomaly detection, and predictive modeling, making it easier for users to extract actionable insights from vast and complex spatial datasets. This technological evolution has not only improved the accuracy and reliability of geospatial analytics but has also democratized access to sophisticated spatial tools, enabling a broader range of end-users—including non-technical professionals—to leverage geospatial intelligence for operational and strategic purposes. The ongoing digital transformation across industries, coupled with the rising demand for real-time situational awareness, is expected to sustain the momentum of market growth throughout the forecast period.
Furthermore, the increasing focus on data governance, compliance, and security is shaping the evolution of the geospatial data catalog platform market. As organizations handle sensitive location data, compliance with data privacy regulations such as GDPR and CCPA has become a top priority. Modern platforms are thus being designed with robust security features, including user authentication, access controls, and audit trails, to ensure the integrity and confidentiality of spatial datasets. This trend is particularly pronounced in sectors such as defense, intelligence, and utilities, where the stakes for data breaches are exceptionally high. The convergence of geospatial data management with enterprise data governance frameworks is anticipated to unlock new opportunities for market players while addressing the growing concerns around data privacy and security.
From a regional perspective, North America continues to dominate the geospatial data catalog platform market, accounting for the largest revenue share in 2024, followed closely by Europe and the Asia Pacific. The region’s leadership is attributed to the early adoption of advanced geospatial technologies, significant investments in smart infrastructure, and the presence of major industry players. Meanwhile, the Asia Pacific region is emerging as the fastest-growing market, driven by rapid urbanization, government-led digitalization initiatives, and increasing investments in geospatial infrastructure. Latin America and the Middle East & Africa are also witnessing steady growth, supported by expanding applications in agriculture, resource management, and disaster response. This regional diversification underscores the global relevance and transformative potential of geospatial data catalog platforms across diverse economic landscapes.
The component segment of the geospatial data catalog pla
Facebook
TwitterThese data can be used in a geographic information system (GIS) for any number of purposes such as assessing wildlife habitat, water quality, pesticide runoff, land use change, etc. The State data sets are provided with a 300 meter buffer beyond the State border to faciliate combining the State files into larger regions. The user must have a firm understanding of how the datasets were compiled and the resulting limitations of these data. The National Land Cover Dataset was compiled from Landsat satellite TM imagery (circa 1992) with a spatial resolution of 30 meters and supplemented by various ancillary data (where available). The analysis and interpretation of the satellite imagery was conducted using very large, sometimes multi-state image mosaics (i.e. up to 18 Landsat scenes). Using a relatively small number of aerial photographs for 'ground truth', the thematic interpretations were necessarily conducted from a spatially-broad perspective. Furthermore, the accuracy assessments (see below) correspond to 'federal regions' which are groupings of contiguous States. Thus, the reliability of the data is greatest at the State or multi- State level. The statistical accuracy of the data is known only for the region. Important Caution Advisory With this in mind, users are cautioned to carefully scrutinize the data to see if they are of sufficient reliability before attempting to use the dataset for larger-scale or local analyses. This evaluation must be made remembering that the NLCD represents conditions in the early 1990s. The New York portion of the NLCD was created as part of land cover mapping activities for Federal Region II that includes the states of New York and New Jersey. The NLCD classification contains 21 different land cover categories with a spatial resolution of 30 meters. The NLCD was produced as a cooperative effort between the U.S. Geological Survey (USGS) and the U.S. Environmental Protection Agency (US EPA) to produce a consistent, land cover data layer for the conterminous U.S. using early 1990s Landsat thematic mapper (TM) data purchased by the Multi-resolution Land Characterization (MRLC) Consortium. The MRLC Consortium is a partnership of federal agencies that produce or use land cover data. Partners include the USGS (National Mapping, Biological Resources, and Water Resources Divisions), US EPA, the U.S. Forest Service, and the National Oceanic and Atmospheric Administration.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Tibetan Plateau is a unique natural geographic unit with the highest average altitude in the world, known as the world's "Third Pole", which is extremely sensitive to global climate change and has a fragile ecological environment, and is an important ecological security barrier in China and even in Asia. Vegetation cover is an important indicator of climate change and ecological environment, and its spatial and temporal distribution patterns and trends are important indicators for assessing the regional ecological environment. In this study, based on the GIMMS NDVI3g and MOD13Q1 NDVI datasets, the monthly maximum values were synthesized by calling the Arcpy service using Python, and then the Savitzky-Golay filtering and denoising, regression analysis and 250 m resolution NDVI data were performed on the month-by-month NDVI data during the year using the GDAL and sklearn packages of the python language. The Savitzky-Golay filter and sklearn package were used to remove noise from the month-by-month NDVI data, regress the overlapping years of the two data sets, analyze the data, and extend the NDVI dataset at 250 m resolution, and finally integrate the month-by-month NDVI time series dataset at 250 m resolution for the Tibetan Plateau for the period 1981–2020. In order to ensure the accuracy and reliability of the data, this dataset is quality-controlled by various means such as quality control of the data source, consistency analysis, SG filtering, month-by-month and image-by-image fitting, and the confidence test for the data products, which ensures the good accuracy and quality of the data.This dataset can reflect the spatial and temporal changes of NDVI on the Tibetan Plateau from 1981 to 2020, and can be used to improve the spatial and temporal resolution of long time series data for the study of vegetation dynamics and spatial pattern of the Tibetan Plateau, as well as for ecological and environmental monitoring.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.
Facebook
TwitterTa (Near-surface air temperature) is an important physical parameter that reflects climate change. In order to obtain daily Ta data (Tmax, Tmin, and Tavg) with high spatial and temporal resolution in China, we fully analyzed the advantages and disadvantages of various existing data (reanalysis, remote sensing, and in situ data) ,Different Ta reconstruction models are constructed for different weather conditions, and we further improve data accuracy through building correction equations for different regions. Finally, a dataset of daily temperature (Tmax, Tmin, and Tavg) in China from 1979 to 2018 was obtained with a spatial resolution of 0.1° For Tmax, validation using in situ data shows that the root mean square error (RMSE) ranges from 0.86 °C to 1.78 °C, the mean absolute error (MAE) varies from 0.63 °C to 1.40 °C, and the Pearson coefficient (R2) ranges from 0.96 to 0.99. For Tmin, RMSE ranges from 0.78 °C to 2.09 °C, the MAE varies from 0.58 °C to 1.61 °C, and the R2 ranges from 0.95 to 0.99. For Tavg, RMSE ranges from 0.35 °C to 1.00 °C, the MAE varies from 0.27 °C to 0.68 °C, and the R2 ranges from 0.99 to 1.00. Furthermore, a variety of evaluation indicators were used to analyze the temporal and spatial variation trends of Ta, and the Tavg increase was more than 0.0 °C/a, which is consistent with the general global warming trend. In conclusion, this dataset had a high spatial resolution and reliable accuracy, which makes up for the previous missing temperature value (Tmax, Tmin, and Tavg) at high spatial resolution. This dataset also provides key parameters for the study of climate change, especially high-temperature drought and low-temperature chilling damage。
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Robot Vision Dataset Services for Space market size reached USD 1.21 billion in 2024, driven by the rapid adoption of AI-driven visual analytics in space missions. The market is projected to grow at a robust CAGR of 18.7% from 2025 to 2033, reaching a forecasted value of USD 6.27 billion by 2033. This remarkable growth is fueled by increasing investments in space exploration, advancements in autonomous robotics, and the critical need for high-quality, annotated datasets to enable reliable and accurate machine vision in complex extraterrestrial environments.
The primary growth factor for the Robot Vision Dataset Services for Space market is the exponential rise in demand for autonomous space robotics and spacecraft. As missions become increasingly complex—ranging from satellite maintenance to planetary exploration—there is a heightened need for robust, annotated datasets that can train AI models to interpret and act on visual information in real time. The integration of deep learning and computer vision technologies into space robotics has amplified the requirement for diverse, high-resolution datasets that can simulate the unpredictable conditions of space, such as varied lighting, terrain, and object recognition scenarios. As a result, space agencies and commercial space enterprises are investing heavily in dataset services that support the development of reliable and intelligent robotic systems.
Another significant driver is the proliferation of commercial space activities and the entry of private players into satellite launches, orbital servicing, and extraterrestrial mining. These commercial entities are leveraging robot vision dataset services to accelerate the development and deployment of autonomous systems that can perform complex tasks without human intervention. The need for precision in navigation, object detection, and manipulation in the harsh space environment necessitates the use of meticulously curated and validated datasets. Additionally, the rise of NewSpace companies and the ongoing miniaturization of satellites have further expanded the scope of applications for robot vision datasets, fostering a competitive ecosystem that encourages innovation and service improvement.
Technological advancements in imaging sensors, multispectral and hyperspectral data acquisition, and cloud-based data processing have also contributed to the market’s robust growth. The ability to capture, annotate, and preprocess vast amounts of data in various formats—including image, video, and spectral data—has enabled service providers to offer highly customized solutions for specific mission requirements. Furthermore, the increasing collaboration between space agencies, research institutions, and commercial vendors has led to the establishment of shared data repositories and open-source initiatives, enhancing the accessibility and quality of robot vision datasets. These collaborative efforts are expected to further accelerate market growth and drive innovation in the coming years.
From a regional perspective, North America currently dominates the Robot Vision Dataset Services for Space market, owing to the presence of leading space agencies such as NASA, a vibrant commercial space sector, and a strong ecosystem of AI and machine vision technology providers. Europe and Asia Pacific are also witnessing substantial growth, fueled by increased government investments in space research and the emergence of regional commercial space ventures. The Middle East & Africa and Latin America, while still nascent, are expected to experience accelerated growth over the forecast period as regional governments and private players increase their focus on space technologies and autonomous robotics.
The service type segment of the Robot Vision Dataset Services for Space market is comprised of dataset collection, annotation, preprocessing, validation, and other ancillary services. Dataset collection forms the foundational layer, involving the gathering of raw visual data from a variety of sources such as satellites, rovers, and space telescopes. Given the complexity of space environments, this process requires sophisticated hardware and software integration to ensure data accuracy and completeness. Service providers are leveraging advanced imaging technologies and remote sensing equipment to capture high-resolution images
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Access APINSW Features of Interest Category - Place Point Please Note WGS 84 = GDA94 service This dataset has a spatial reference of [WGS 84 = GDA94] and can NOT be easily consumed into GDA2020 …Show full description Access APINSW Features of Interest Category - Place Point Please Note WGS 84 = GDA94 service This dataset has a spatial reference of [WGS 84 = GDA94] and can NOT be easily consumed into GDA2020 environments. A similar service with a ‘multiCRS’ suffix is available which can support GDA2020, GDA94 and WGS84 = GDA2020 environments. In due course, and allowing time for user feedback and testing, it is intended that these original services will adopt the new multiCRS functionally. Place point is a point feature class within the Features of interest Category. There is no overall accuracy reported in the database, however accuracy of the individual feature instances of each feature class can be found in the database tables. The currency of the feature instances in this dataset can be found in “feature reliability date” or “attribute reliability date” attributes. All feature instances in this class are attributed with a planimetric accuracy value. It is expected that the 90% of well-defined points with the same planimetric accuracy are within 0.5mm of that map scale. Depending on the capture source, capture method, digital update and control point upgrade, every feature instance reported has a positional accuracy within the range of 1m - 100m. Place Points included in the layer include: City - A centre of population, commerce and culture with all essential services; a town of significant size and importance, generally accorded the legal right to call itself a city under, either, the Local Government Act, the Crown Lands Act or other instruments. This point feature dataset is part of Spatial Services Defined Administrative Data Sets. City data points are positioned within the cadastral parcel in which they are located. Locality - A bounded area within the landscape that has a rural character. Locality data points are positioned within the cadastral parcel in which they are located. Region - A region is a relatively large tract of land distinguished by certain common characteristics, natural or cultural. Natural unifying features could include same drainage basin, similar landforms, or climatic conditions, a special flora or fauna, or the like. This point feature dataset is part of Spatial Services Defined Administrative Data Sets. Region data points are positioned within the cadastral parcel in which they are located.Rural Place - A place, site or precinct in a rural landscape, generally of small extent, the name of which is in current use. This point feature dataset is part of Spatial Services Defined Administrative Data Sets. Rural place data points are positioned within the cadastral parcel in which they are located. Suburb - A gazetted boundary of a suburb or locality area as defined by the Geographical Names Board of NSW. This point feature dataset is part of Spatial Services Defined Administrative Data Sets. Suburb data points are positioned within the cadastral parcel in which they are located. Town - A commercial nucleus offering a wide range of services and a large number of shops, often several of the same type. Depending on size, the residential area can be relatively compact or (in addition) dispersed in clusters on the periphery. This point feature dataset is part of Spatial Services Defined Administrative Data Sets. Town data points are positioned within the cadastral parcel in which they are located. Urban Place - A place, site or precinct in an urban landscape, the name of which is in current use, but the limits of which have not been defined under the address locality program. This point feature dataset is part of Spatial Services Defined Administrative Data Sets. Urban place data points are positioned within the cadastral parcel in which they are located. Village - A cohesive populated place in a rural landscape, which may provide a limited range of services to the local area. Residential subdivisions are in urban lot sizes. This point feature dataset is part of Spatial Services Defined Administrative Data Sets. Village data points are positioned within the cadastral parcel in which they are located. MetadataType Esri Feature Service Update Frequency As required Contact Details Contact us via the Spatial Services Customer Hub Relationship to Themes and Datasets Features of Interest Category of the Foundation Spatial Data Framework (FSDF) Accuracy The dataset maintains a positional relationship to, and alignment with, a range of themes from the NSW FSDF including, transport, imagery, positioning, water and land cover. This dataset was captured by digitising the best available cadastral mapping at a variety of scales and accuracies, ranging from 1:500 to 1:250 000 according to the National Mapping Council of Australia, Standards of Map Accuracy (1975). Therefore, the position of the feature instance will be within 0.5mm at map scale for 90% of the well-defined points. That is, 1:500 = 0.25m, 1:2000 = 1m, 1:4000 = 2m, 1:25000 = 12.5m, 1:50000 = 25m and 1:100000 = 50m. A program of positional upgrade (accuracy improvement) is currently underway. Spatial Reference System (dataset) Geocentric Datum of Australia 1994 (GDA94), Australian Height Datum (AHD) Spatial Reference System (web service) EPSG 4326: WGS84 Geographic 2D WGS84 Equivalent To GDA94 Spatial Extent Full state Standards and Specifications Open Geospatial Consortium (OGC) implemented and compatible for consumption by common GIS platforms. Available as either cache or non-cache, depending on client use or requirement. Distributors Service Delivery, DCS Spatial Services 346 Panorama Ave Bathurst NSW 2795Dataset Producers and Contributors Administrative Spatial Programs, DCS Spatial Services 346 Panorama Ave Bathurst NSW 2795
Facebook
TwitterThe National Land Cover Dataset was compiled from Landsat satellite TM imagery (circa 1992) with a spatial resolution of 30 meters and supplemented by various ancillary data (where available). The analysis and interpretation of the satellite imagery was conducted using very large, sometimes multi-state image mosaics (i.e. up to 18 Landsat scenes). Using a relatively small number of aerial photographs for 'ground truth', the thematic interpretations were necessarily conducted from a spatially-broad perspective. Furthermore, the accuracy assessments (see below) correspond to 'federal regions' which are groupings of contiguous states. Thus, the reliability of the data is greatest at the state or multi-State level. The statistical accuracy of the data is known only for the region. Important Caution Advisory With this in mind, users are cautioned to carefully scrutinize the data to see if they are of sufficient reliability before attempting to use the dataset for larger-scale or local analyses. This evaluation must be made remembering that the NLCD represents conditions in the early 1990s. The New Hampshire portion of the NLCD was created as part of land cover mapping activities for Federal Region I that includes the States of Connecticut, Maine, Vermont, Rhode Island, New Hampshire, and Massachusetts. The NLCD classification contains 21 different land cover categories with a spatial resolution of 30 meters. The NLCD was produced as a cooperative effort between the U.S. Geological Survey (USGS) and the U.S. Environmental Protection Agency (USEPA) to produce a consistent, land cover data layer for the conterminous U.S. using early 1990s Landsat thematic mapper (TM) data purchased by the Multi-resolution Land Characterization (MRLC) Consortium. The MRLC Consortium is a partnership of federal agencies that produce or use land cover data. Partners include the USGS (National Mapping, Biological Resources, and Water Resources Divisions), USEPA, the U.S. Forest Service, and the National Oceanic and Atmospheric Administration. The original NLCD grid was projected into UTM Zone 19 and was clipped to a box surrounding the USDA Forest Service, Hubbard Brook Experimental Forest.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"We believe that by accounting for the inherent uncertainty in the system during each measurement, the relationship between cause and effect can be assessed more accurately, potentially reducing the duration of research."
Short description
This dataset was created as part of a research project investigating the efficiency and learning mechanisms of a Bayesian adaptive search algorithm supported by the Imprecision Entropy Indicator (IEI) as a novel method. It includes detailed statistical results, posterior probability values, and the weighted averages of IEI across multiple simulations aimed at target localization within a defined spatial environment. Control experiments, including random search, random walk, and genetic algorithm-based approaches, were also performed to benchmark the system's performance and validate its reliability.
The task involved locating a target area centered at (100; 100) within a radius of 10 units (Research_area.png), inside a circular search space with a radius of 100 units. The search process continued until 1,000 successful target hits were achieved.
To benchmark the algorithm's performance and validate its reliability, control experiments were conducted using alternative search strategies, including random search, random walk, and genetic algorithm-based approaches. These control datasets serve as baselines, enabling comprehensive comparisons of efficiency, randomness, and convergence behavior across search methods, thereby demonstrating the effectiveness of our novel approach.
Uploaded files
The first dataset contains the average IEI values, generated by randomly simulating 300 x 1 hits for 10 bins per quadrant (4 quadrants in total) using the Python programming language, and calculating the corresponding IEI values. This resulted in a total of 4 x 10 x 300 x 1 = 12,000 data points. The summary of the IEI values by quadrant and bin is provided in the file results_1_300.csv. The calculation of IEI values for averages is based on likelihood, using an absolute difference-based approach for the likelihood probability computation. IEI_Likelihood_Based_Data.zip
The weighted IEI average values for likelihood calculation (Bayes formula) are provided in the file Weighted_IEI_Average_08_01_2025.xlsx
This dataset contains the results of a simulated target search experiment using Bayesian posterior updates and Imprecision Entropy Indicators (IEI). Each row represents a hit during the search process, including metrics such as Shannon entropy (H), Gini index (G), average distance, angular deviation, and calculated IEI values. The dataset also includes bin-specific posterior probability updates and likelihood calculations for each iteration. The simulation explores adaptive learning and posterior penalization strategies to optimize the search efficiency. Our Bayesian adaptive searching system source code (search algorithm, 1000 target searches): IEI_Self_Learning_08_01_2025.pyThis dataset contains the results of 1,000 iterations of a successful target search simulation. The simulation runs until the target is successfully located for each iteration. The dataset includes further three main outputs: a) Results files (results{iteration_number}.csv): Details of each hit during the search process, including entropy measures, Gini index, average distance and angle, Imprecision Entropy Indicators (IEI), coordinates, and the bin number of the hit. b) Posterior updates (Pbin_all_steps_{iter_number}.csv): Tracks the posterior probability updates for all bins during the search process acrosations multiple steps. c) Likelihoodanalysis(likelihood_analysis_{iteration_number}.csv): Contains the calculated likelihood values for each bin at every step, based on the difference between the measured IEI and pre-defined IE bin averages. IEI_Self_Learning_08_01_2025.py
Based on the mentioned Python source code (see point 3, Bayesian adaptive searching method with IEI values), we performed 1,000 successful target searches, and the outputs were saved in the:Self_learning_model_test_output.zip file.
Bayesian Search (IEI) from different quadrant. This dataset contains the results of Bayesian adaptive target search simulations, including various outputs that represent the performance and analysis of the search algorithm. The dataset includes: a) Heatmaps (Heatmap_I_Quadrant, Heatmap_II_Quadrant, Heatmap_III_Quadrant, Heatmap_IV_Quadrant): These heatmaps represent the search results and the paths taken from each quadrant during the simulations. They indicate how frequently the system selected each bin during the search process. b) Posterior Distributions (All_posteriors, Probability_distribution_posteriors_values, CDF_posteriors_values): Generated based on posterior values, these files track the posterior probability updates, including cumulative distribution functions (CDF) and probability distributions. c) Macro Summary (summary_csv_macro): This file aggregates metrics and key statistics from the simulation. It summarizes the results from the individual results.csv files. d) Heatmap Searching Method Documentation (Bayesian_Heatmap_Searching_Method_05_12_2024): This document visualizes the search algorithm's path, showing how frequently each bin was selected during the 1,000 successful target searches. e) One-Way ANOVA Analysis (Anova_analyze_dataset, One_way_Anova_analysis_results): This includes the database and SPSS calculations used to examine whether the starting quadrant influences the number of search steps required. The analysis was conducted at a 5% significance level, followed by a Games-Howell post hoc test [43] to identify which target-surrounding quadrants differed significantly in terms of the number of search steps. Results were saved in the Self_learning_model_test_results.zip
This dataset contains randomly generated sequences of bin selections (1-40) from a control search algorithm (random search) used to benchmark the performance of Bayesian-based methods. The process iteratively generates random numbers until a stopping condition is met (reaching target bins 1, 11, 21, or 31). This dataset serves as a baseline for analyzing the efficiency, randomness, and convergence of non-adaptive search strategies. The dataset includes the following: a) The Python source code of the random search algorithm. b) A file (summary_random_search.csv) containing the results of 1000 successful target hits. c) A heatmap visualizing the frequency of search steps for each bin, providing insight into the distribution of steps across the bins. Random_search.zip
This dataset contains the results of a random walk search algorithm, designed as a control mechanism to benchmark adaptive search strategies (Bayesian-based methods). The random walk operates within a defined space of 40 bins, where each bin has a set of neighboring bins. The search begins from a randomly chosen starting bin and proceeds iteratively, moving to a randomly selected neighboring bin, until one of the stopping conditions is met (bins 1, 11, 21, or 31). The dataset provides detailed records of 1,000 random walk iterations, with the following key components: a) Individual Iteration Results: Each iteration's search path is saved in a separate CSV file (random_walk_results_.csv), listing the sequence of steps taken and the corresponding bin at each step. b) Summary File: A combined summary of all iterations is available in random_walk_results_summary.csv, which aggregates the step-by-step data for all 1,000 random walks. c) Heatmap Visualization: A heatmap file is included to illustrate the frequency distribution of steps across bins, highlighting the relative visit frequencies of each bin during the random walks. d) Python Source Code: The Python script used to generate the random walk dataset is provided, allowing reproducibility and customization for further experiments. Random_walk.zip
This dataset contains the results of a genetic search algorithm implemented as a control method to benchmark adaptive Bayesian-based search strategies. The algorithm operates in a 40-bin search space with predefined target bins (1, 11, 21, 31) and evolves solutions through random initialization, selection, crossover, and mutation over 1000 successful runs. Dataset Components: a) Run Results: Individual run data is stored in separate files (genetic_algorithm_run_.csv), detailing: Generation: The generation number. Fitness: The fitness score of the solution. Steps: The path length in bins. Solution: The sequence of bins visited. b) Summary File: summary.csv consolidates the best solutions from all runs, including their fitness scores, path lengths, and sequences. c) All Steps File: summary_all_steps.csv records all bins visited during the runs for distribution analysis. d) A heatmap was also generated for the genetic search algorithm, illustrating the frequency of bins chosen during the search process as a representation of the search pathways.Genetic_search_algorithm.zip
Technical Information
The dataset files have been compressed into a standard ZIP archive using Total Commander (version 9.50). The ZIP format ensures compatibility across various operating systems and tools.
The XLSX files were created using Microsoft Excel Standard 2019 (Version 1808, Build 10416.20027)
The Python program was developed using Visual Studio Code (Version 1.96.2, user setup), with the following environment details: Commit fabd6a6b30b49f79a7aba0f2ad9df9b399473380f, built on 2024-12-19. The Electron version is 32.6, and the runtime environment includes Chromium 128.0.6263.186, Node.js 20.18.1, and V8 12.8.374.38-electron.0. The operating system is Windows NT x64 10.0.19045.
The statistical analysis included in this dataset was partially conducted using IBM SPSS Statistics, Version 29.0.1.0
The CSV files in this dataset were created following European standards, using a semicolon (;) as the delimiter instead of a comma, encoded in UTF-8 to ensure compatibility with a wide
Facebook
TwitterThe protected area shapefile was compiled by the Nature Conservancy (TNC) with substantial submissions and assistance from 21 different governments with sometimes multiple representatives within a government in the insular Caribbean. The data represents a core dataset in conservation for the region. The Nature Conservancy works to keep this file as up to date as possible and uses it heavily in representing what is protected across the insular Caribbean. The origins of the data are from the World Database on Protected Areas (WDPA). The WDPA is the most inclusive spatial dataset on marine and terrestrial protected areas globally. Since 1981, UNEP-WCMC, through its Protected Areas Program, has been gathering aspatial (tabular) information and making it available to the global community. In the late 1980s, the WDPA started to include spatial data and in 2003 the WDPA was formally incorporating the UN list of protected areas. Today, the WDPA is a joint project of UNEP and IUCN, produced by UNEP-WCMC and the IUCN World Commission on Protected Areas and works with governments and collaborating NGOs, like TNC. Although TNC began working in the insular Caribbean in 1974, its work was largely site based and limited in regional scope depending on previously existing projects or donations. This changed with the Conservation Assessment of the Insular Caribbean (Huggins et al. 2007). Protected areas were a key component in the conservation assessment, thus the need for a spatial (GIS) dataset of boundaries along with tabular attributes for all protected across the insular Caribbean. The Conservation Assessment of the Insular Caribbean primarily used the WDPA Consortium 2003 version, with addition local information. With the launch of the Caribbean Challenge in 2008 and the pledges made by participating countries, the need for an accurate and reliable spatial dataset of existing and proposed protected areas materialized. Protected area representation is most useful when it accurately reflects a changing dynamic. In particular, the areas of research to greatly benefit from a dynamic dataset is MPA cluster analysis related to international connectivity and deeper MPAs that meet the conservation needs of pelagic biodiversity on the high seas (Game et al. 2009). But maintaining this accuracy requires appropriate effort and capacity not yet observed. As intended, the WDPA represents the authority and foundation of protected areas for the many uses of this information. However, the current model for updating the WDPA seems problematic for the insular Caribbean given the discrepancies between this dataset and WDPAs current version. Along with the Caribbean Marine Protected Area Managers network and forum (CaMPAM)and TNC, these three data depositories have contributed in parts to the insular Caribbean protected area GIS dataset, but better collaboration must be formed to ensure timely data flows and that all representations of the insular Caribbean protected area dataset are the sameThe majority of Caribbean governments lack the necessary resources to track and update protected area boundaries and attributes. These databases require continual custodial stewardship as new data becomes available and updates are needed. Consequently, strategic partnerships where resources and talent can be shared are necessary to fill the technical and resource capacity gaps that exist for island governments. We recommend using a regional approach via partner collaboration, and national expert review and validation to sustain the protected area dataset. This process will facilitate regular updates to the World Database on Protected Areas (WDPA). To date, the associated Caribbean regional protected area dataset represents the most accurate baseline on which country-level MPA statistics can be reported. As efforts to fulfill commitments to regional and global biodiversity goals continue to ramp up (e.g. Caribbean Challenge Initiative, CDB Aichi Biodiversity Targets), this data will serve as an important baseline and resource to Caribbean governments and regional entities seeking to assess current levels of marine protection and understand the remaining gap that needs to be filled.The protected areas in this file are locations which receive some sort of protection or designated as a particular managed area under the law related to natural, ecological and/or cultural values. Overlap does occur in this dataset.All attributes have associated definitions in the fields section of the metadata. However notable fields include:SOURCE - Main source of the particular data for respective record.CF - Confidence in record accuracy from TNC's perspectivePROTDATE - Established date of protected areaCOUNTRY - Country or government nameMOD_DATE - Date that the record in this dataset was added or updatedEDITOR - Name of the editor performing the updates or modifications to this dataSTATUS - Status of the area as designated or proposedWDPA_ID - Protected area ID as assigned in the WDPACAMPAM_ID - Protected area ID as assigned in the CaMPAM databaseON_WATER - Used to designate what is a marine protected area (MPA) and what is a terrestrial protected area. "MPA"s were primarily designated by applying the IUCN MPA definition to the insular Caribbean protected area GIS shapefile. Essentially, any boundary (i.e. “area… which has been reserved by law or other effective means to protect part or all of the enclosed environment”) which overlaps the shoreline representing “intertidal or subtidal terrain, together with its overlying water and associated flora, fauna, historical, and cultural features” was initially selected as a MPA. This preliminary output is refined by the intent of the protected area, determined through local knowledge, the protected area name and/or legal definition. In some cases, although a boundary might include “intertidal or subtidal terrain,” the true intent of the law or means to protect the environment excluded the “marine” section. SBIS_ID - Corresponds to ID number for streaming data into The Bahamas Spatial Biodiversity Information SystemNotable Comments by Government:USVI - includes lands owned by TNC which are split out by parcel. This means that this dataset does not represent a one off list of protected areas by record unless these are removed and merged. Also includes Areas of Particular ConcernReferencesGame, Edward T., Hedley S. Grantham, Alistair J. Hobday, Robert L. Pressey, Amanda T. Lombard, Lynnath E. Beckley, Kristina Gjerde, Rodrigo Bustamante, Hugh P. Possingham, and Anthony J. Richardson. 2009. “Pelagic Protected Areas: The Missing Dimension in Ocean Conservation.” Trends in Ecology & Evolution 24 (7): 360 – 369. doi:http://dx.doi.org/10.1016/j.tree.2009.01.011.Huggins, A.E., S. Keel, P. Kramer, F. Núñez, S. Schill, R. Jeo, A. Chatwin, K. Thurlow, M. McPherson, M. Libby, R. Tingey, M. Palmer and R. Seybert 2007. Biodiversity Conservation Assessment of the Insular Caribbean Using the Caribbean Decision Support System, Technical Report, The Nature Conservancy - See more at: http://www.conservationgateway.org/ConservationByGeography/NorthAmerica/Caribbean/science/planning
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accurate long-term temperature and precipitation estimates at high spatial and temporal resolutions are vital for a wide variety of climatological studies. We have produced a new, publicly available, daily, gridded maximum temperature, minimum temperature, and precipitation dataset for China with a high spatial resolution of 1 km and over a long-term period (1961 to 2019). It has been named the HRLT. The daily gridded data were interpolated using comprehensive statistical analyses, which included machine learning, the generalized additive model, and thin plate splines. It is based on the 0.5° × 0.5° grid dataset from the China Meteorological Administration, together with covariates for elevation, aspect, slope, topographic wetness index, latitude, and longitude. The accuracy of the HRLT daily dataset was assessed using observation data from meteorological stations. The maximum and minimum temperature estimates were more accurate than the precipitation estimates. For maximum temperature, the mean absolute error (MAE), root mean square error (RMSE), Pearson's correlation coefficient (Cor), coefficient of determination after adjustment (R²), and Nash-Sutcliffe modeling efficiency (NSE) were 1.07 °C, 1.62 °C 0.99, 0.98, and 0.98, respectively. For minimum temperature, the MAE, RMSE, Cor, R², and NSE were 1.08°C, 1.53 °C, 0.99, 0.99, and 0.99, respectively. For precipitation, the MAE, RMSE, Cor, R², and NSE were 1.30 mm, 4.78 mm, 0.84, 0.71, and 0.70, respectively. The accuracy of the HRLT was compared to those of the other three existing datasets and its accuracy was either greater than the others, especially for precipitation, or comparable in accuracy, but with higher spatial resolution and over a longer time period. In summary, the HRLT dataset, which has a high spatial resolution, covers a longer period of time and has reliable accuracy, is suitable for future environmental analyses, especially the effects of extreme weather.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The monthly air temperature in 1153 stations and precipitation in 1202 stations in China and neighboring countries were collected to construct a monthly climate dataset in China on 0.025 ° resolution (approximately 2.5 km) named LZU0025 dataset designed by Lanzhou University (LZU), using a partial thin plate smoothing method embedded in the ANUSPLIN software. The accuracy of the LZU0025 was evaluated from analyzing three aspects: 1) Diagnostic statistics from surface fitting model in the period of 1951-2011, and results show low mean square root of generalized cross validation (RTGCV) for monthly air temperature surface (1.1 °C) and monthly precipitation surface (2 mm1/2) which interpolated the square root of itself. This indicate exact surface fitting models. 2) Error statistics based on 265 withheld stations data in the period of 1951-2011, and results show that predicted values closely tracked true values with mean absolute error (MAE) of 0.6 °C and 4 mm and standard deviation of mean error (STD) of 1.3 °C and 5 mm, and monthly STDs presented consistent change with RTGCV varying. 3) Comparisons to other datasets through two ways, one was to compare three indices namely the standard deviation, mean and time trend derived from all datasets to referenced dataset released by the China Meteorological Administration (CMA) in the Taylor diagrams, the other was to compare LZU0025 to the Camp Tibet dataset on mountainous remote area. Taylor diagrams displayed the standard deviation derived from LZU had higher correlation with that induced from CMA (Pearson correlation R=0.76 for air temperature case and R=0.96 for precipitation case). The standard deviation for this index derived from LZU was more close to that induced from CMA, and the centered normalized root-mean-square difference for this index derived from LZU and CMA was lower. The same superior performance of LZU were found in comparing indices of the mean and time trend derived from LZU and those induced from other datasets. LZU0025 had high correlation with the Camp dataset for air temperature despite of insignificant correlation for precipitation in few stations. Based on above comprehensive analyses, LZU0025 was concluded as the reliable dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.