Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
import h5py
import numpy as np
index = 0 # adjust face index
datafile = h5py.File('dataset.hdf5', 'r')
image = datafile['dataset_0'][index][:256*256]
threshold = 20 # in dB
# scale the amplitude image logarithmically
image[math.isnan(image)] = 0
image = 20 * np.log10(image)
max = np.max(image)
min = max - threshold
image = (image - min) / (max - min)
image[image < 0] = 0
image.reshape((256,256))
# the depth image is between 0.22 m and 0.58 m
image_depth = datafile['dataset_0'][index][-256*256:]
image_depth = image_depth.reshape((256,256))
image_depth[image == 0] = 0.58 # ignore pixels that are ignored in the amlitude image
image_depth = np.nan_to_num(image_depth, nan=0.58)
image_depth[image_depth == 0] = 0.58
image_depth = (image_depth - 0.22) / (0.58-0.22)
# load other data (set start_index and end_index according to the data that shall be loaded)
data = datafile['dataset_0'][index][start_index:end_index]
We would like to thank the Rohde & Schwarz GmbH & Co. KG (Munich, Germany) for providing the radar imaging devices and technical support that made this study possible.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
함께 놀아볼까요? 무궁화 꽃이 피었습니다 😜 빅데이터 분석기사 실기 준비를 위한 데이터 셋입니다. 더 좋은 코드를 만든다면 많은 공유 부탁드려요🎉 (Python과 R모두 환영합니다.)
분류(3회 기출 심화 변형) : https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline
작업형1 (3회 기출 유형)
작업형1 모의문제2(심화) https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2
Tasks 탭에서 문제 및 코드 확인
[2회차 기출 유형] 작업형1 P: https://www.kaggle.com/agileteam/tutorial-t1-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-2-r-2
공식 예시문제(작업형1) P: https://www.kaggle.com/agileteam/tutorial-t1-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-r
T1-1.Outlier(IQR) / #이상치 #IQR P: https://www.kaggle.com/agileteam/py-t1-1-iqr-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-1-iqr-expected-questions-2
T1-2.Outlier(age) / #이상치 #소수점나이 P: https://www.kaggle.com/agileteam/py-t1-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-2-expected-questions-2
T1-3. Missing data / #결측치 #삭제 #중앙 #평균 P: https://www.kaggle.com/agileteam/py-t1-3-map-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-3-expected-questions-2
T1-4. Skewness and Kurtosis (Log Scale) / #왜도 #첨도 #로그스케일 P: https://www.kaggle.com/agileteam/py-t1-4-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-4-expected-questions-2
T1-5. Standard deviation / #표준편차 P: https://www.kaggle.com/agileteam/py-t1-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-5-expected-questions-2
T1-6. Groupby Sum / #결측치 #조건 P: https://www.kaggle.com/agileteam/py-t1-6-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-6-expected-questions-2
T1-7. Replace / #값변경 #조건 #최대값 P: https://www.kaggle.com/agileteam/py-t1-7-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-7-2-expected-questions-2
T1-8. Cumulative Sum / #누적합 #결측치 #보간 P: https://www.kaggle.com/agileteam/py-t1-8-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-8-expected-questions-2
T1-9. Standardization / #표준화 #중앙값 P: https://www.kaggle.com/agileteam/py-t1-9-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-9-expected-questions-2
T1-10. Yeo-Johnson and Box–Cox / #여존슨 #박스-콕스 #결측치 #최빈값 P: https://www.kaggle.com/agileteam/py-t1-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-10-expected-questions-2
T1-11. min-max scaling / #스케일링 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-11-min-max-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-11-min-max-5-expected-questions-2
T1-12. top10-bottom10 / #그룹핑 #정렬 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-12-10-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-12-10-expected-questions-2
T1-13. Correlation / #상관관계 P: https://www.kaggle.com/agileteam/py-t1-13-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-13-expected-questions-2
T1-14. Multi Index & Groupby / #멀티인덱스 #정렬 #인덱스리셋 #상위값 P: https://www.kaggle.com/agileteam/py-t1-14-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-14-2-expected-question-2
T1-15. Slicing & Condition / #슬라이싱 #결측치 #중앙값 #조건 P: https://www.kaggle.com/agileteam/py-t1-15-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-15-expected-question-2
T1-16. Variance / #분산 #결측치전후값차이 P: https://www.kaggle.com/agileteam/py-t1-16-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-16-expected-question-2
T1-17. Time-Series1 / #시계열데이터 #datetime P: https://www.kaggle.com/agileteam/py-t1-17-1-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-17-1-expected-question-2
T1-18. Time-Series2 / #주말 #평일 #비교 #시계열데이터 P: https://www.kaggle.com/agileteam/py-t1-18-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-18-2-expected-question-2
T1-19. Time-Series3 (monthly total) / #월별 #총계 #비교 #데이터값변경
P: https://www.kaggle.com/agileteam/py-t1-19-3-expected-question
R: https://www.kaggle.com/limmyoungjin/r-t1-19-3-expected-question-2
T1-20. Combining Data / 데이터 #병합 #결합 / 고객과 궁합이 맞는 타입 매칭
P: https://www.kaggle.com/agileteam/py-t1-20-expected-question
R: https://www.kaggle.com/limmyoungjin/r-t1-20-expected-question-2
T1-21. Binning Data / #비닝 #구간나누기 P: https://www.kaggle.com/agileteam/py-t1-21-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-21-expected-question-2
T1-22. Time-Series4 (Weekly data) / #주간 #합계 P: https://www.kaggle.com/agileteam/t1-22-time-series4-weekly-data R: https://www.kaggle.com/limmyoungjin/r-t1-22-time-series4-weekly-data-2
T1-23. Drop Duplicates / #중복제거 #결측치 #10번째값으로채움 P: https://www.kaggle.com/agileteam/t1-23-drop-duplicates R: https://www.kaggle.com/limmyoungjin/r-t1-23-drop-duplicates-2
T1-24. Time-Series5 (Lagged Feature) / #시차데이터 #조건 P: https://www.kaggle.com/agileteam/t1-24-time-series5-lagged-feature R: https://www.kaggle.com/limmyoungjin/r-t1-24-time-series5-2
[MOCK EXAM1] TYPE1 / 작업형1 모의고사 P: https://www.kaggle.com/agileteam/mock-exam1-type1-1-tutorial R: https://www.kaggle.com/limmyoungjin/mock-exam1-type1-1
[MOCK EXAM2] TYPE1 / 작업형1 모의고사2 P: https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2
Tasks 탭에서 문제 및 코드 확인 - [3회차 기출유형 작업형2] : 여행 보험 패키지 상품 (데이터를 조금 어렵게 변경함) P: https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline
[2회차 기출유형 작업형2] : E-Commerce Shipping Data P: https://www.kaggle.com/agileteam/tutorial-t2-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t2-2-r
T2. Exercise / 예시문제 : 백화점고객의 1년간 데이터 (dataq 공식 예제) P: https://www.kaggle.com/agileteam/t2-exercise-tutorial-baseline
T2-1. Titanic (Classification) / 타이타닉 P: https://www.kaggle.com/agileteam/t2-1-titanic-simple-baseline R: https://www.kaggle.com/limmyoungjin/r-t2-1-titanic
T2-2. Pima Indians Diabetes (Classification) / 당뇨병 P: https://www.kaggle.com/agileteam/t2-2-pima-indians-diabetes R: https://www.kaggle.com/limmyoungjin/r-t2-2-pima-indians-diabetes
T2-3. Adult Census Income (Classification) / 성인 인구소득 예측 P: https://www.kaggle.com/agileteam/t2-3-adult-census-income-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-3-adult-census-income
T2-4. House Prices (Regression) / 집값 예측 / RMSE P: https://www.kaggle.com/code/blighpark/t2-4-house-prices-regression R: https://www.kaggle.com/limmyoungjin/r-t2-4-house-prices
T2-5. Insurance Forecast (Regression) / P: https://www.kaggle.com/agileteam/insurance-starter-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-5-insurance-prediction
T2-6. Bike-sharing-demand (Regression) / 자전거 수요 예측 / RMSLE P: R: https://www.kaggle.com/limmyoungjin/r-t2-6-bike-sharing-demand
[MOCK EXAM1] TYPE2. HR-DATA / 작업형2 모의고사 P: https://www.kaggle.com/agileteam/mock-exam-t2-exam-template(템플릿만 제공) https://www.kaggle.com/agileteam/mock-exam-t2-starter-tutorial (찐입문자용) https://www.kaggle.com/agileteam/mock-exam-t2-baseline-tutorial (베이스라인)
주차 | 유형(에디터) | 번호 |
---|---|---|
6주 전 | 작업형1(노트북) | T1-1~5 |
5주 전 | 작업형1(노트북) | T1-6~9, T1 EQ(기출), |
4주 전 | 작업형1(스크립트), 작업형2(노트북) | T1-10~13, T1.Ex, T2EQ, T2-1 |
3주 전 | 작업형1(스크립트), 작업형2(노트북) | T1-14~19, T2-2~3 |
2주 전 | 작업형1(스크립트), 작업형2(스크립트) | T1-20~21, T2-4~6, 복습 |
1주 전 | 작업형1, 작업형2(스크립트), 단답형 | T1-22~24, 모의고사, 복습, 응시환경 체험, 단답 |
This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!
The data is based on California Census in 1990.
"This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.
The following is the description from the book author:
This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).
The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."
http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html
This is a dataset obtained from the StatLib repository. Here is the included description:
"We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘California Housing Data (1990)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/housing on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!
The data is based on California Census in 1990.
"This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.
The following is the description from the book author:
This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).
The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."
http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html
This is a dataset obtained from the StatLib repository. Here is the included description:
"We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study investigated the relationship between psychological skills and fitness levels among elite taekwondo athletes. A total of ten Iranian male elite taekwondo athletes (mean age of 20.6±2 years, BMI 18.78±0.62 kg/m2, and fat percentage of 8.87±1.46%) participated in the study. The Sports Emotional Intelligence Questionnaire, Sports Success Scale, Sport Mental Toughness Questionnaire, and Mindfulness Inventory for Sport were used to assess psychological factors. The Wingate test was used to determine anaerobic power, and the Bruce test to determine aerobic fitness. Descriptive statistics and Spearman rank correlation coefficients were utilised to examine any relationships between subscales. Statistically significant correlations were recorded between the evaluation of feelings (EI scale) and VO2peak (ml/kg/min) (r = -0.70, p = 0.0235) and between social skills (EI scale) and relative peak power (W/kg) (r = 0.84, p = 0.0026). Also, between optimism (EI scale) and VO2peak (ml/kg/min) (r = -0.70, p = 0.0252) and between optimism (EI scale) and HR-MAX (r = -0.75, p = 0.0123); and, finally, between control (mental toughness scale) and relative peak power (W/kg) (r = 0.67, p = 0.0360). These findings demonstrate relationships between psychological factors and the advantages of good anaerobic and aerobic capabilities. Finally, the study also demonstrated that elite taekwondo athletes have high mental performance abilities that are interrelated with anaerobic and aerobic performance.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical (1985–2005) and RCP 4.5 and RCP 8.5 (2006–2080) This dataset is a monthly-scale aggregation of the NEX-GDDP: NASA Earth Exchange Global Daily Downscaled Climate Projections processed using Google Earth Engine (Gorelick 2017). The native delivery on Google Earth Engine is at the daily timescale for each individual CMIP5 GCM model. This dataset was created to facilitate use of NEX-GDDP and reduce processing times for projects that seek an ensemble model with a coarser temporal resolution. The aggregated data have been made available in Google Earth Engine via 'users/cartoscience/GCM_NASA-NEX-GDDP/NEX-GDDP-PRODUCT-ID_Ensemble-Monthly_YEAR' (see code below on how to access), and all 171 GeoTIFFS have been uploaded to this dataverse entry. Relevant links: https://www.nasa.gov/nex https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp https://esgf.nccs.nasa.gov/esgdoc/NEX-GDDP_Tech_Note_v0.pdf https://developers.google.com/earth-engine/datasets/catalog/NASA_NEX-GDDP https://journals.ametsoc.org/view/journals/bams/93/4/bams-d-11-00094.1.xml https://rd.springer.com/article/10.1007/s10584-011-0156-z#page-1 The dataset can be accessed within Google Earth Engine using the following code: var histYears = ee.List.sequence(1985,2005).getInfo() var rcpYears = ee.List.sequence(2006,2080).getInfo() var path1 = 'users/cartoscience/GCM_NASA-NEX-GDDP/NEX-GDDP-' var path2 = '_Ensemble-Monthly_' var product product = 'Hist' var hist = ee.ImageCollection( histYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) product = 'RCP45' var rcp45 = ee.ImageCollection( rcpYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) product = 'RCP85' var rcp85 = ee.ImageCollection( rcpYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) print( 'Hist (1985–2005)', hist, 'RCP45 (2006–2080)', rcp45, 'RCP85 (2006–2080)', rcp85 ) var first = hist.first() var tMax = first.select('tasmin_1') var tMin = first.select('tasmax_1') var tMean = first.select('tmean_1') var pSum = first.select('pr_1') Map.addLayer(tMax, {min: -10, max: 40}, 'Average min temperature Jan 1985 (Hist)', false) Map.addLayer(tMin, {min: 10, max: 40}, 'Average max temperature Jan 1985 (Hist)', false) Map.addLayer(tMean, {min: 10, max: 40}, 'Average temperature Jan 1985 (Hist)', false) Map.addLayer(pSum, {min: 10, max: 500}, 'Accumulated rainfall Jan 1985 (Hist)', true) https://code.earthengine.google.com/5bfd9741274679dded7a95d1b57ca51d Ensemble average based on the following models: ACCESS1-0,BNU-ESM,CCSM4,CESM1-BGC,CNRM-CM5, CSIRO-Mk3-6-0,CanESM2,GFDL-CM3,GFDL-ESM2G, GFDL-ESM2M,IPSL-CM5A-LR,IPSL-CM5A-MR,MIROC-ESM-CHEM, MIROC-ESM,MIROC5,MPI-ESM-LR,MPI-ESM-MR,MRI-CGCM3, NorESM1-M,bcc-csm1-1,inmcm4 Each annual GeoTIFF contains 48 bands (4 variables across 12 months)— Temperature: Monthly mean (tasmin, tasmax, tmean) Precipitation: Monthly sum (pr) Bands 1–48 correspond with: tasmin_1, tasmax_1, tmean_1, pr_1, tasmin_2, tasmax_2, tmean_2, pr_2, tasmin_3, tasmax_3, tmean_3, pr_3, tasmin_4, tasmax_4, tmean_4, pr_4, tasmin_5, tasmax_5, tmean_5, pr_5, tasmin_6, tasmax_6, tmean_6, pr_6, tasmin_7, tasmax_7, tmean_7, pr_7, tasmin_8, tasmax_8, tmean_8, pr_8, tasmin_9, tasmax_9, tmean_9, pr_9, tasmin_10, tasmax_10, tmean_10, pr_10, tasmin_11, tasmax_11, tmean_11, pr_11, tasmin_12, tasmax_12, tmean_12, pr_12 *Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D. and Moore, R., 2017. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, pp.18–27. Project information: SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
<<< This dataset is not released yet. Release date: 1st September, 2025. >>>
The Semantic Segmentation Map Dataset (Semap) contains 1,439 manually annotated map samples. Specifically, the dataset compiles 356 image patches from the Historical City Maps Semantic Segmentation Dataset (HCMSSD, [1]), 78 samples extracted from 19th century European cadastres [2–4], three from Paris city atlases [5], and 1,002 newly annotated samples, drawn from the Aggregated Dataset on the History of Cartography (ADHOC Images, [6]).
Additionally, it comprises 12,122 synthetically generated image samples and related labels.
Both datasets are part of the R. Petitpierre's PhD thesis [7]. Extensive details on annotation, and synthetical generation procedures are provided in the context of that work.
To come soon.
Number of semantic classes: 5 + background
Number of manually annotated image samples: 1,439
Number of synthetically-generated samples:
Image sample size:
min: 768 × 768 pixels
max: 1000 × 1000 pixels
For any mention of this dataset, please cite :
@misc{semap_petitpierre_2025,
author = {Petitpierre, R{\'{e}}mi and Gomez Donoso, Damien and Kriesel, Ben},
title = {{Semantic Segmentation Map Dataset (Semap)}},
year = {2025},
publisher = {EPFL},
url = {https://doi.org/10.5281/zenodo.16164782}}@phdthesis{studying_maps_petitpierre_2025,
author = {Petitpierre, R{\'{e}}mi},
title = {{Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration}},
year = {2025},
school = {EPFL}}
Rémi PETITPIERRE - remi.petitpierre@epfl.ch - ORCID - Github - Scholar - ResearchGate
80% of the data were annotated by RP. The remainder were annotated by DGD and BK, two master's students from EPFL, Switzerland. The students were paid for their work using public funding, and were offered the possibility to be associated with the publication of the data.
This project is licensed under the CC BY 4.0 License.
We do not assume any liability for the use of this dataset.
iSDAsoil dataset soil extractable Sulphur (S) log-transformed predicted at 30 m resolution for 0–20 and 20–50 cm depth intervals. Data has been projected in WGS84 coordinate system and compiled as COG. Predictions have been generated using multi-scale Ensemble Machine Learning with 250 m (MODIS, PROBA-V, climatic variables and similar) and 30 m (DTM derivatives, Landsat, Sentinel-2 and similar) resolution covariates. For model training we use a pan-African compilations of soil samples and profiles (iSDA points, AfSPDB, and other national and regional soil datasets). Cite as: Hengl, T., Miller, M.A.E., Križan, J. et al. African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Sci Rep 11, 6130 (2021). https://doi.org/10.1038/s41598-021-85639-y To open the maps in QGIS and/or directly compute with them, please use the Cloud-Optimized GeoTIFF version. Layer description: sol_log.s_mehlich3_m_30m_*..*cm_2001..2017_v0.13_wgs84.tif = predicted soil extractable Sulphur mean value, sol_log.s_mehlich3_md_30m_*..*cm_2001..2017_v0.13_wgs84.tif = predicted soil extractable Sulphur model (prediction) errors, Model errors were derived using bootstrapping: md is derived as standard deviation of individual learners from 5-fold cross-validation (using spatial blocking). The model 5-fold cross-validation (mlr::makeStackedLearner) for this variable indicates:
Variable: log.s_mehlich3 R-square: 0.548 Fitted values sd: 0.423 RMSE: 0.384 Random forest model: Call: stats::lm(formula = f, data = d) Residuals: Min 1Q Median 3Q Max -2.5729 -0.2102 -0.0264 0.1694 5.0049 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.459208 4.154229 0.351 0.725 regr.ranger 0.937179 0.016167 57.967 < 2e-16 *** regr.xgboost 0.002587 0.016252 0.159 0.874 regr.cubist 0.145396 0.010890 13.351 < 2e-16 *** regr.nnet -0.672062 1.796642 -0.374 0.708 regr.cvglmnet -0.045157 0.011256 -4.012 6.04e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3841 on 37530 degrees of freedom Multiple R-squared: 0.5481, Adjusted R-squared: 0.548 F-statistic: 9103 on 5 and 37530 DF, p-value: < 2.2e-16
To back-transform values (y) to ppm use the following formula: ppm = expm1( y / 10 )
To submit an issue or request support please visit https://isda-africa.com/isdasoil
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
import h5py
import numpy as np
index = 0 # adjust face index
datafile = h5py.File('dataset.hdf5', 'r')
image = datafile['dataset_0'][index][:256*256]
threshold = 20 # in dB
# scale the amplitude image logarithmically
image[math.isnan(image)] = 0
image = 20 * np.log10(image)
max = np.max(image)
min = max - threshold
image = (image - min) / (max - min)
image[image < 0] = 0
image.reshape((256,256))
# the depth image is between 0.22 m and 0.58 m
image_depth = datafile['dataset_0'][index][-256*256:]
image_depth = image_depth.reshape((256,256))
image_depth[image == 0] = 0.58 # ignore pixels that are ignored in the amlitude image
image_depth = np.nan_to_num(image_depth, nan=0.58)
image_depth[image_depth == 0] = 0.58
image_depth = (image_depth - 0.22) / (0.58-0.22)
# load other data (set start_index and end_index according to the data that shall be loaded)
data = datafile['dataset_0'][index][start_index:end_index]
We would like to thank the Rohde & Schwarz GmbH & Co. KG (Munich, Germany) for providing the radar imaging devices and technical support that made this study possible.