8 datasets found

SynthRad-Faces: Synthetic Radar Dataset of Human Faces
zenodo.org
bin
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Braeutigam; Valentin Braeutigam (2025). SynthRad-Faces: Synthetic Radar Dataset of Human Faces [Dataset]. http://doi.org/10.5281/zenodo.14264739
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14264739
Dataset updated
Jan 21, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Valentin Braeutigam; Valentin Braeutigam
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Radar Image Dataset

Dataset Structure

`dataset.hdf` contains the dataset of 10,000 synthetic radar images with the according parameters.

The data for each instance is saved at the following indices:

[000000 - 065536] : radar amplitude image (unscaled)

[065536 - 065540] : radar image bounding box [x_min, x_max, y_min, y_max]

[065540 - 065739] : shape parameters (199 parameters)

[065739 - 065938] : color parameters (199 parameters)

[065938 - 066038] : expression parameters (100 parameters)

[066038 - 066045] : pose (scaling_factor, rotation(roll, pitch, yaw), translation(x, y, z))

[066045 - 066061] : transformation matrix to radar coordinate system

[066061 - 066067] : synthetic radar parameters (scaling factor, carrier frequency, delta frequency, number antennas, number samples, material factor, antenna size)

[066067 - 131603] : radar depth image (unscaled)

Face Model parameters

We used the face12 mask of the Basel Face Model 2019 (contained in the file model2019_face12.h5) for the sampling of the faces. The face model can be registered for here: https://faces.dmi.unibas.ch/bfm/bfm2019.html. The scalismo face framework (https://github.com/unibas-gravis/scalismo-faces) can be used to generate the face meshes from the shape, (color), and expression parameters. Additionally, they can be transformed by applying the pose.

Load Data

One can load and scale the image data with the following python code:

import h5py

import numpy as np

index = 0 # adjust face index

datafile = h5py.File('dataset.hdf5', 'r')

image = datafile['dataset_0'][index][:256*256]

threshold = 20 # in dB

# scale the amplitude image logarithmically

image[math.isnan(image)] = 0

image = 20 * np.log10(image)

max = np.max(image)

min = max - threshold

image = (image - min) / (max - min)

image[image < 0] = 0

image.reshape((256,256))

# the depth image is between 0.22 m and 0.58 m

image_depth = datafile['dataset_0'][index][-256*256:]

image_depth = image_depth.reshape((256,256))

image_depth[image == 0] = 0.58 # ignore pixels that are ignored in the amlitude image

image_depth = np.nan_to_num(image_depth, nan=0.58)

image_depth[image_depth == 0] = 0.58

image_depth = (image_depth - 0.22) / (0.58-0.22) # load other data (set start_index and end_index according to the data that shall be loaded)

data = datafile['dataset_0'][index][start_index:end_index]

Acknowledgments

We would like to thank the Rohde & Schwarz GmbH & Co. KG (Munich, Germany) for providing the radar imaging devices and technical support that made this study possible.

Big Data Certification KR

kaggle.com

zip

Updated Nov 29, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

KIM TAE HEON (2021). Big Data Certification KR [Dataset]. https://www.kaggle.com/agileteam/bigdatacertificationkr

Explore at:

zip(15840 bytes)Available download formats

Dataset updated

Nov 29, 2021

Authors

KIM TAE HEON

License

Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically

Description

빅데이터 분석기사 실기 준비 놀이터

함께 놀아볼까요? 무궁화 꽃이 피었습니다 😜 빅데이터 분석기사 실기 준비를 위한 데이터 셋입니다. 더 좋은 코드를 만든다면 많은 공유 부탁드려요🎉 (Python과 R모두 환영합니다.)

4회 기출 유형

작업형2 유형 컴피티션 : https://www.kaggle.com/competitions/big-data-analytics-certification-kr-2022
베이스라인: (준비중)

3회 기출 유형 및 심화 학습자료

빅데이터 분석기사 컴피티션🍭
https://www.kaggle.com/competitions/big-data-analytics-certification

🆕 New 문제 업데이트 2022.6

작업형2
회귀: https://www.kaggle.com/code/agileteam/t2-2-2-baseline-r2
분류(3회 기출 심화 변형) : https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline
작업형1 (3회 기출 유형)
작업형1 모의문제2(심화) https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2

🎁 빅데이터 분식기사 실기 입문 강의 Open 🎁

https://class101.page.link/tp9k
입문자를 위한 강의 오픈 했어요 👍
파이썬-판다스-머신러닝-모의문제(작업형1,2)-꿀팁 등을 실기 준비에 필요한 내용만 친절하게 알려드려요🎉
머신러닝을 해보신 분이라면 수강 할 필요 없을 것 같아요, 바로 모의 문제를 풀기 힘든 설명이 필요한 찐 입문자에게 추천드려요!

📌작업형1 예상문제 (P:파이썬, R)

Tasks 탭에서 문제 및 코드 확인

[2회차 기출 유형] 작업형1 P: https://www.kaggle.com/agileteam/tutorial-t1-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-2-r-2
공식 예시문제(작업형1) P: https://www.kaggle.com/agileteam/tutorial-t1-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-r
T1-1.Outlier(IQR) / #이상치 #IQR P: https://www.kaggle.com/agileteam/py-t1-1-iqr-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-1-iqr-expected-questions-2
T1-2.Outlier(age) / #이상치 #소수점나이 P: https://www.kaggle.com/agileteam/py-t1-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-2-expected-questions-2
T1-3. Missing data / #결측치 #삭제 #중앙 #평균 P: https://www.kaggle.com/agileteam/py-t1-3-map-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-3-expected-questions-2
T1-4. Skewness and Kurtosis (Log Scale) / #왜도 #첨도 #로그스케일 P: https://www.kaggle.com/agileteam/py-t1-4-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-4-expected-questions-2
T1-5. Standard deviation / #표준편차 P: https://www.kaggle.com/agileteam/py-t1-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-5-expected-questions-2
T1-6. Groupby Sum / #결측치 #조건 P: https://www.kaggle.com/agileteam/py-t1-6-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-6-expected-questions-2
T1-7. Replace / #값변경 #조건 #최대값 P: https://www.kaggle.com/agileteam/py-t1-7-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-7-2-expected-questions-2
T1-8. Cumulative Sum / #누적합 #결측치 #보간 P: https://www.kaggle.com/agileteam/py-t1-8-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-8-expected-questions-2
T1-9. Standardization / #표준화 #중앙값 P: https://www.kaggle.com/agileteam/py-t1-9-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-9-expected-questions-2
T1-10. Yeo-Johnson and Box–Cox / #여존슨 #박스-콕스 #결측치 #최빈값 P: https://www.kaggle.com/agileteam/py-t1-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-10-expected-questions-2
T1-11. min-max scaling / #스케일링 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-11-min-max-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-11-min-max-5-expected-questions-2
T1-12. top10-bottom10 / #그룹핑 #정렬 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-12-10-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-12-10-expected-questions-2
T1-13. Correlation / #상관관계 P: https://www.kaggle.com/agileteam/py-t1-13-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-13-expected-questions-2
T1-14. Multi Index & Groupby / #멀티인덱스 #정렬 #인덱스리셋 #상위값 P: https://www.kaggle.com/agileteam/py-t1-14-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-14-2-expected-question-2
T1-15. Slicing & Condition / #슬라이싱 #결측치 #중앙값 #조건 P: https://www.kaggle.com/agileteam/py-t1-15-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-15-expected-question-2
T1-16. Variance / #분산 #결측치전후값차이 P: https://www.kaggle.com/agileteam/py-t1-16-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-16-expected-question-2
T1-17. Time-Series1 / #시계열데이터 #datetime P: https://www.kaggle.com/agileteam/py-t1-17-1-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-17-1-expected-question-2
T1-18. Time-Series2 / #주말 #평일 #비교 #시계열데이터 P: https://www.kaggle.com/agileteam/py-t1-18-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-18-2-expected-question-2
T1-19. Time-Series3 (monthly total) / #월별 #총계 #비교 #데이터값변경
P: https://www.kaggle.com/agileteam/py-t1-19-3-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-19-3-expected-question-2
T1-20. Combining Data / 데이터 #병합 #결합 / 고객과 궁합이 맞는 타입 매칭
P: https://www.kaggle.com/agileteam/py-t1-20-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-20-expected-question-2
T1-21. Binning Data / #비닝 #구간나누기 P: https://www.kaggle.com/agileteam/py-t1-21-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-21-expected-question-2
T1-22. Time-Series4 (Weekly data) / #주간 #합계 P: https://www.kaggle.com/agileteam/t1-22-time-series4-weekly-data R: https://www.kaggle.com/limmyoungjin/r-t1-22-time-series4-weekly-data-2
T1-23. Drop Duplicates / #중복제거 #결측치 #10번째값으로채움 P: https://www.kaggle.com/agileteam/t1-23-drop-duplicates R: https://www.kaggle.com/limmyoungjin/r-t1-23-drop-duplicates-2
T1-24. Time-Series5 (Lagged Feature) / #시차데이터 #조건 P: https://www.kaggle.com/agileteam/t1-24-time-series5-lagged-feature R: https://www.kaggle.com/limmyoungjin/r-t1-24-time-series5-2
[MOCK EXAM1] TYPE1 / 작업형1 모의고사 P: https://www.kaggle.com/agileteam/mock-exam1-type1-1-tutorial R: https://www.kaggle.com/limmyoungjin/mock-exam1-type1-1
[MOCK EXAM2] TYPE1 / 작업형1 모의고사2 P: https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2

📌작업형2 예상문제

Tasks 탭에서 문제 및 코드 확인 - [3회차 기출유형 작업형2] : 여행 보험 패키지 상품 (데이터를 조금 어렵게 변경함) P: https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline

[2회차 기출유형 작업형2] : E-Commerce Shipping Data P: https://www.kaggle.com/agileteam/tutorial-t2-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t2-2-r
T2. Exercise / 예시문제 : 백화점고객의 1년간 데이터 (dataq 공식 예제) P: https://www.kaggle.com/agileteam/t2-exercise-tutorial-baseline
T2-1. Titanic (Classification) / 타이타닉 P: https://www.kaggle.com/agileteam/t2-1-titanic-simple-baseline R: https://www.kaggle.com/limmyoungjin/r-t2-1-titanic
T2-2. Pima Indians Diabetes (Classification) / 당뇨병 P: https://www.kaggle.com/agileteam/t2-2-pima-indians-diabetes R: https://www.kaggle.com/limmyoungjin/r-t2-2-pima-indians-diabetes
T2-3. Adult Census Income (Classification) / 성인 인구소득 예측 P: https://www.kaggle.com/agileteam/t2-3-adult-census-income-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-3-adult-census-income
T2-4. House Prices (Regression) / 집값 예측 / RMSE P: https://www.kaggle.com/code/blighpark/t2-4-house-prices-regression R: https://www.kaggle.com/limmyoungjin/r-t2-4-house-prices
T2-5. Insurance Forecast (Regression) / P: https://www.kaggle.com/agileteam/insurance-starter-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-5-insurance-prediction
T2-6. Bike-sharing-demand (Regression) / 자전거 수요 예측 / RMSLE P: R: https://www.kaggle.com/limmyoungjin/r-t2-6-bike-sharing-demand
[MOCK EXAM1] TYPE2. HR-DATA / 작업형2 모의고사 P: https://www.kaggle.com/agileteam/mock-exam-t2-exam-template(템플릿만 제공) https://www.kaggle.com/agileteam/mock-exam-t2-starter-tutorial (찐입문자용) https://www.kaggle.com/agileteam/mock-exam-t2-baseline-tutorial (베이스라인)

📌6 주 완성 코스 (아래 표 참고)

주차	유형(에디터)	번호
6주 전	작업형1(노트북)	T1-1~5
5주 전	작업형1(노트북)	T1-6~9, T1 EQ(기출),
4주 전	작업형1(스크립트), 작업형2(노트북)	T1-10~13, T1.Ex, T2EQ, T2-1
3주 전	작업형1(스크립트), 작업형2(노트북)	T1-14~19, T2-2~3
2주 전	작업형1(스크립트), 작업형2(스크립트)	T1-20~21, T2-4~6, 복습
1주 전	작업형1, 작업형2(스크립트), 단답형	T1-22~24, 모의고사, 복습, 응시환경 체험, 단답

📌입문자를 위한 머신러닝 튜토리얼 (공유해주신 노트북 중 선정하였음👍)

- https://www.kaggle.com/ohseokkim/t2-2-pima-indians-diabetes 작성자: @ohseokkim 😆

California Housing Data (1990)
kaggle.com
Updated May 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 10, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Harry Wang
Area covered
California
Description
Source

This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

The data is based on California Census in 1990.

About the Data (from the book):

"This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

The following is the description from the book author:

This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

About the Data (From Luís Torgo page):

http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

This is a dataset obtained from the StatLib repository. Here is the included description:

"We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

End-to-End ML Project Steps (Chapter 2 of the book)

Look at the big picture

Get the data

Discover and visualize the data to gain insights

Prepare the data for Machine Learning algorithms

Select a model and train it

Fine-tune your model

Present your solution

Launch, monitor, and maintain your system

The 10-Step Machine Learning Project Workflow (My Version)

Define business object

Make sense of the data from a high level

data types (number, text, object, etc.)

continuous/discrete

basic stats (min, max, std, median, etc.) using boxplot

frequency via histogram

scales and distributions of different features

Create the traning and test sets using proper sampling methods, e.g., random vs. stratified

Correlation analysis (pair-wise and attribute combinations)

Data cleaning (missing data, outliers, data errors)

Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)

Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)

Fine tune the model using trying different combinations of hyperparameters

Evaluate the model with best estimators in the test set

Launch, monitor, and refresh the model and system
A
‘California Housing Data (1990)’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Data (1990)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-data-1990-a0c5/b7389540/?iid=007-628&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
Analysis of ‘California Housing Data (1990)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/housing on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Source

This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

The data is based on California Census in 1990.

About the Data (from the book):

"This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

The following is the description from the book author:

This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

About the Data (From Luís Torgo page):

http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

This is a dataset obtained from the StatLib repository. Here is the included description:

"We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

End-to-End ML Project Steps (Chapter 2 of the book)

Look at the big picture

Get the data

Discover and visualize the data to gain insights

Prepare the data for Machine Learning algorithms

Select a model and train it

Fine-tune your model

Present your solution

Launch, monitor, and maintain your system

The 10-Step Machine Learning Project Workflow (My Version)

Define business object

Make sense of the data from a high level

data types (number, text, object, etc.)

continuous/discrete

basic stats (min, max, std, median, etc.) using boxplot

frequency via histogram

scales and distributions of different features

Create the traning and test sets using proper sampling methods, e.g., random vs. stratified

Correlation analysis (pair-wise and attribute combinations)

Data cleaning (missing data, outliers, data errors)

Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)

Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)

Fine tune the model using trying different combinations of hyperparameters

Evaluate the model with best estimators in the test set

Launch, monitor, and refresh the model and system

--- Original source retains full ownership of the source dataset ---
f
Data from: S1 Dataset -
figshare.com
xlsx
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maghsoud Nabilpour; Mohammad Hossein Samanipour; Nicola Luigi Bragazzi; Monoem Haddad; Tomás Herrera-Valenzuela; Dan Tao; Julien S. Baker; Jožef Šimenko (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0288227.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0288227.s001
Dataset updated
Jul 7, 2023
Dataset provided by
PLOS ONE
Authors
Maghsoud Nabilpour; Mohammad Hossein Samanipour; Nicola Luigi Bragazzi; Monoem Haddad; Tomás Herrera-Valenzuela; Dan Tao; Julien S. Baker; Jožef Šimenko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study investigated the relationship between psychological skills and fitness levels among elite taekwondo athletes. A total of ten Iranian male elite taekwondo athletes (mean age of 20.6±2 years, BMI 18.78±0.62 kg/m2, and fat percentage of 8.87±1.46%) participated in the study. The Sports Emotional Intelligence Questionnaire, Sports Success Scale, Sport Mental Toughness Questionnaire, and Mindfulness Inventory for Sport were used to assess psychological factors. The Wingate test was used to determine anaerobic power, and the Bruce test to determine aerobic fitness. Descriptive statistics and Spearman rank correlation coefficients were utilised to examine any relationships between subscales. Statistically significant correlations were recorded between the evaluation of feelings (EI scale) and VO2peak (ml/kg/min) (r = -0.70, p = 0.0235) and between social skills (EI scale) and relative peak power (W/kg) (r = 0.84, p = 0.0026). Also, between optimism (EI scale) and VO2peak (ml/kg/min) (r = -0.70, p = 0.0252) and between optimism (EI scale) and HR-MAX (r = -0.75, p = 0.0123); and, finally, between control (mental toughness scale) and relative peak power (W/kg) (r = 0.67, p = 0.0360). These findings demonstrate relationships between psychological factors and the advantages of good anaerobic and aerobic capabilities. Finally, the study also demonstrated that elite taekwondo athletes have high mental performance abilities that are interrelated with anaerobic and aerobic performance.
H
Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical...
dataverse.harvard.edu
search.dataone.org
Updated Dec 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brad Peter; Joseph Messina; Nishani Moragoda (2021). Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical (1985–2005) and RCP 4.5 and RCP 8.5 (2006–2080) [Dataset]. http://doi.org/10.7910/DVN/ZNEJMS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZNEJMS
Dataset updated
Dec 12, 2021
Dataset provided by
Harvard Dataverse
Authors
Brad Peter; Joseph Messina; Nishani Moragoda
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical (1985–2005) and RCP 4.5 and RCP 8.5 (2006–2080) This dataset is a monthly-scale aggregation of the NEX-GDDP: NASA Earth Exchange Global Daily Downscaled Climate Projections processed using Google Earth Engine (Gorelick 2017). The native delivery on Google Earth Engine is at the daily timescale for each individual CMIP5 GCM model. This dataset was created to facilitate use of NEX-GDDP and reduce processing times for projects that seek an ensemble model with a coarser temporal resolution. The aggregated data have been made available in Google Earth Engine via 'users/cartoscience/GCM_NASA-NEX-GDDP/NEX-GDDP-PRODUCT-ID_Ensemble-Monthly_YEAR' (see code below on how to access), and all 171 GeoTIFFS have been uploaded to this dataverse entry. Relevant links: https://www.nasa.gov/nex https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp https://esgf.nccs.nasa.gov/esgdoc/NEX-GDDP_Tech_Note_v0.pdf https://developers.google.com/earth-engine/datasets/catalog/NASA_NEX-GDDP https://journals.ametsoc.org/view/journals/bams/93/4/bams-d-11-00094.1.xml https://rd.springer.com/article/10.1007/s10584-011-0156-z#page-1 The dataset can be accessed within Google Earth Engine using the following code: var histYears = ee.List.sequence(1985,2005).getInfo() var rcpYears = ee.List.sequence(2006,2080).getInfo() var path1 = 'users/cartoscience/GCM_NASA-NEX-GDDP/NEX-GDDP-' var path2 = '_Ensemble-Monthly_' var product product = 'Hist' var hist = ee.ImageCollection( histYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) product = 'RCP45' var rcp45 = ee.ImageCollection( rcpYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) product = 'RCP85' var rcp85 = ee.ImageCollection( rcpYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) print( 'Hist (1985–2005)', hist, 'RCP45 (2006–2080)', rcp45, 'RCP85 (2006–2080)', rcp85 ) var first = hist.first() var tMax = first.select('tasmin_1') var tMin = first.select('tasmax_1') var tMean = first.select('tmean_1') var pSum = first.select('pr_1') Map.addLayer(tMax, {min: -10, max: 40}, 'Average min temperature Jan 1985 (Hist)', false) Map.addLayer(tMin, {min: 10, max: 40}, 'Average max temperature Jan 1985 (Hist)', false) Map.addLayer(tMean, {min: 10, max: 40}, 'Average temperature Jan 1985 (Hist)', false) Map.addLayer(pSum, {min: 10, max: 500}, 'Accumulated rainfall Jan 1985 (Hist)', true) https://code.earthengine.google.com/5bfd9741274679dded7a95d1b57ca51d Ensemble average based on the following models: ACCESS1-0,BNU-ESM,CCSM4,CESM1-BGC,CNRM-CM5, CSIRO-Mk3-6-0,CanESM2,GFDL-CM3,GFDL-ESM2G, GFDL-ESM2M,IPSL-CM5A-LR,IPSL-CM5A-MR,MIROC-ESM-CHEM, MIROC-ESM,MIROC5,MPI-ESM-LR,MPI-ESM-MR,MRI-CGCM3, NorESM1-M,bcc-csm1-1,inmcm4 Each annual GeoTIFF contains 48 bands (4 variables across 12 months)— Temperature: Monthly mean (tasmin, tasmax, tmean) Precipitation: Monthly sum (pr) Bands 1–48 correspond with: tasmin_1, tasmax_1, tmean_1, pr_1, tasmin_2, tasmax_2, tmean_2, pr_2, tasmin_3, tasmax_3, tmean_3, pr_3, tasmin_4, tasmax_4, tmean_4, pr_4, tasmin_5, tasmax_5, tmean_5, pr_5, tasmin_6, tasmax_6, tmean_6, pr_6, tasmin_7, tasmax_7, tmean_7, pr_7, tasmin_8, tasmax_8, tmean_8, pr_8, tasmin_9, tasmax_9, tmean_9, pr_9, tasmin_10, tasmax_10, tmean_10, pr_10, tasmin_11, tasmax_11, tmean_11, pr_11, tasmin_12, tasmax_12, tmean_12, pr_12 *Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D. and Moore, R., 2017. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, pp.18–27. Project information: SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)
z
Semantic Segmentation Map Dataset (Semap)
zenodo.org
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Remi Petitpierre; Remi Petitpierre; Damien Gomez Donoso; Ben Kriesel; Ben Kriesel; Damien Gomez Donoso (2025). Semantic Segmentation Map Dataset (Semap) [Dataset]. http://doi.org/10.5281/zenodo.16164782
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.16164782
Dataset updated
Sep 1, 2025
Dataset provided by
EPFL
Authors
Remi Petitpierre; Remi Petitpierre; Damien Gomez Donoso; Ben Kriesel; Ben Kriesel; Damien Gomez Donoso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 1, 2025
Description
<<< This dataset is not released yet. Release date: 1st September, 2025. >>>

The Semantic Segmentation Map Dataset (Semap) contains 1,439 manually annotated map samples. Specifically, the dataset compiles 356 image patches from the Historical City Maps Semantic Segmentation Dataset (HCMSSD, [1]), 78 samples extracted from 19th century European cadastres [2–4], three from Paris city atlases [5], and 1,002 newly annotated samples, drawn from the Aggregated Dataset on the History of Cartography (ADHOC Images, [6]).

Additionally, it comprises 12,122 synthetically generated image samples and related labels.

Both datasets are part of the R. Petitpierre's PhD thesis [7]. Extensive details on annotation, and synthetical generation procedures are provided in the context of that work.

Organization of the data

To come soon.

Descriptive statistics

Number of semantic classes: 5 + background
Number of manually annotated image samples: 1,439
Number of synthetically-generated samples:
Image sample size:
min: 768 × 768 pixels
max: 1000 × 1000 pixels

Use and Citation

For any mention of this dataset, please cite :

@misc{semap_petitpierre_2025,
author = {Petitpierre, R{\'{e}}mi and Gomez Donoso, Damien and Kriesel, Ben},
title = {{Semantic Segmentation Map Dataset (Semap)}},
year = {2025}, publisher = {EPFL}, url = {https://doi.org/10.5281/zenodo.16164782}}

@phdthesis{studying_maps_petitpierre_2025,
author = {Petitpierre, R{\'{e}}mi},
title = {{Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration}},
year = {2025}, school = {EPFL}}

Corresponding author

Rémi PETITPIERRE - remi.petitpierre@epfl.ch - ORCID - Github - Scholar - ResearchGate

Work ethics

80% of the data were annotated by RP. The remainder were annotated by DGD and BK, two master's students from EPFL, Switzerland. The students were paid for their work using public funding, and were offered the possibility to be associated with the publication of the data.

License

This project is licensed under the CC BY 4.0 License.

Liability

We do not assume any liability for the use of this dataset.

References

Petitpierre, R. (2021). Historical City Maps Semantic Segmentation Dataset. V1.0. https://doi.org/10.5281/zenodo.5513639

di Lenardo I, Barman R, Pardini F, et al. (2021) Une approche computationnelle du cadastre napoléonien de Venise. Humanités numériques 3.

Petitpierre R, Rappo L and di Lenardo I (2023) Recartographier l’espace napoléonien. In: Humanistica 2023, Genève, Switzerland, June 2023. Géographie. Association francophone des humanités numériques. Available at: https://hal.science/hal-04109214.

Li S, Cerioni A, Herny C, et al. (2024) Vectorization of historical cadastral plans from the 1850s in the Canton of Geneva. Geneva, Switzerland: Swiss Territorial Data Lab. Available at: https://tech.stdl.ch/PROJ-CADMAP/.

Chazalon J, Carlinet E, Chen Y, et al. (2021) ICDAR 2021 Competition on Historical Map Segmentation. arXiv:2105.13265 [cs].

To come soon

Petitpierre R (2025) Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration. PhD thesis. École Polytechnique Fédérale de Lausanne.
l
iSDAsoil: soil extractable Sulphur for Africa predicted at 30 m resolution...
kenya.lsc-hubs.org
lschub.kalro.org
+2more
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). iSDAsoil: soil extractable Sulphur for Africa predicted at 30 m resolution at 0-20 and 20-50 cm depths [Dataset]. https://kenya.lsc-hubs.org/cat/collections/metadata:main/items/10.5281-zenodo.4091142
Explore at:
Dataset updated
Feb 5, 2024
Description
iSDAsoil dataset soil extractable Sulphur (S) log-transformed predicted at 30 m resolution for 0–20 and 20–50 cm depth intervals. Data has been projected in WGS84 coordinate system and compiled as COG. Predictions have been generated using multi-scale Ensemble Machine Learning with 250 m (MODIS, PROBA-V, climatic variables and similar) and 30 m (DTM derivatives, Landsat, Sentinel-2 and similar) resolution covariates. For model training we use a pan-African compilations of soil samples and profiles (iSDA points, AfSPDB, and other national and regional soil datasets). Cite as: Hengl, T., Miller, M.A.E., Križan, J. et al. African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Sci Rep 11, 6130 (2021). https://doi.org/10.1038/s41598-021-85639-y To open the maps in QGIS and/or directly compute with them, please use the Cloud-Optimized GeoTIFF version. Layer description: sol_log.s_mehlich3_m_30m_*..*cm_2001..2017_v0.13_wgs84.tif = predicted soil extractable Sulphur mean value, sol_log.s_mehlich3_md_30m_*..*cm_2001..2017_v0.13_wgs84.tif = predicted soil extractable Sulphur model (prediction) errors, Model errors were derived using bootstrapping: md is derived as standard deviation of individual learners from 5-fold cross-validation (using spatial blocking). The model 5-fold cross-validation (mlr::makeStackedLearner) for this variable indicates:
Variable: log.s_mehlich3 R-square: 0.548 Fitted values sd: 0.423 RMSE: 0.384 Random forest model: Call: stats::lm(formula = f, data = d) Residuals: Min 1Q Median 3Q Max -2.5729 -0.2102 -0.0264 0.1694 5.0049 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.459208 4.154229 0.351 0.725 regr.ranger 0.937179 0.016167 57.967 < 2e-16 *** regr.xgboost 0.002587 0.016252 0.159 0.874 regr.cubist 0.145396 0.010890 13.351 < 2e-16 *** regr.nnet -0.672062 1.796642 -0.374 0.708 regr.cvglmnet -0.045157 0.011256 -4.012 6.04e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3841 on 37530 degrees of freedom Multiple R-squared: 0.5481, Adjusted R-squared: 0.548 F-statistic: 9103 on 5 and 37530 DF, p-value: < 2.2e-16
To back-transform values (y) to ppm use the following formula:
ppm = expm1( y / 10 )
To submit an issue or request support please visit https://isda-africa.com/isdasoil
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Valentin Braeutigam; Valentin Braeutigam (2025). SynthRad-Faces: Synthetic Radar Dataset of Human Faces [Dataset]. http://doi.org/10.5281/zenodo.14264739

SynthRad-Faces: Synthetic Radar Dataset of Human Faces

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14264739

Dataset updated

Jan 21, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Valentin Braeutigam; Valentin Braeutigam

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Radar Image Dataset

Dataset Structure

`dataset.hdf` contains the dataset of 10,000 synthetic radar images with the according parameters.

The data for each instance is saved at the following indices:

[000000 - 065536] : radar amplitude image (unscaled)

[065536 - 065540] : radar image bounding box [x_min, x_max, y_min, y_max]

[065540 - 065739] : shape parameters (199 parameters)

[065739 - 065938] : color parameters (199 parameters)

[065938 - 066038] : expression parameters (100 parameters)

[066038 - 066045] : pose (scaling_factor, rotation(roll, pitch, yaw), translation(x, y, z))

[066045 - 066061] : transformation matrix to radar coordinate system

[066061 - 066067] : synthetic radar parameters (scaling factor, carrier frequency, delta frequency, number antennas, number samples, material factor, antenna size)

[066067 - 131603] : radar depth image (unscaled)

Face Model parameters

We used the face12 mask of the Basel Face Model 2019 (contained in the file model2019_face12.h5) for the sampling of the faces. The face model can be registered for here: https://faces.dmi.unibas.ch/bfm/bfm2019.html. The scalismo face framework (https://github.com/unibas-gravis/scalismo-faces) can be used to generate the face meshes from the shape, (color), and expression parameters. Additionally, they can be transformed by applying the pose.

Load Data

One can load and scale the image data with the following python code:

import h5py

import numpy as np

index = 0 # adjust face index

datafile = h5py.File('dataset.hdf5', 'r')

image = datafile['dataset_0'][index][:256*256]

threshold = 20 # in dB

# scale the amplitude image logarithmically

image[math.isnan(image)] = 0

image = 20 * np.log10(image)

max = np.max(image)

min = max - threshold

image = (image - min) / (max - min)

image[image < 0] = 0

image.reshape((256,256))

# the depth image is between 0.22 m and 0.58 m

image_depth = datafile['dataset_0'][index][-256*256:]

image_depth = image_depth.reshape((256,256))

image_depth[image == 0] = 0.58 # ignore pixels that are ignored in the amlitude image

image_depth = np.nan_to_num(image_depth, nan=0.58)

image_depth[image_depth == 0] = 0.58

image_depth = (image_depth - 0.22) / (0.58-0.22)

# load other data (set start_index and end_index according to the data that shall be loaded)

data = datafile['dataset_0'][index][start_index:end_index]

Acknowledgments

We would like to thank the Rohde & Schwarz GmbH & Co. KG (Munich, Germany) for providing the radar imaging devices and technical support that made this study possible.

Clear search

Close search

Google apps

Main menu

SynthRad-Faces: Synthetic Radar Dataset of Human Faces

Radar Image Dataset

Dataset Structure

Face Model parameters

Load Data

Acknowledgments

Big Data Certification KR

빅데이터 분석기사 실기 준비 놀이터

4회 기출 유형

3회 기출 유형 및 심화 학습자료

🆕 New 문제 업데이트 2022.6

🎁 빅데이터 분식기사 실기 입문 강의 Open 🎁

📌작업형1 예상문제 (P:파이썬, R)

📌작업형2 예상문제

📌6 주 완성 코스 (아래 표 참고)

📌입문자를 위한 머신러닝 튜토리얼 (공유해주신 노트북 중 선정하였음👍)

- https://www.kaggle.com/ohseokkim/t2-2-pima-indians-diabetes 작성자: @ohseokkim 😆

California Housing Data (1990)

Source

About the Data (from the book):

About the Data (From Luís Torgo page):

End-to-End ML Project Steps (Chapter 2 of the book)

The 10-Step Machine Learning Project Workflow (My Version)

‘California Housing Data (1990)’ analyzed by Analyst-2

Source

About the Data (from the book):

About the Data (From Luís Torgo page):

End-to-End ML Project Steps (Chapter 2 of the book)

The 10-Step Machine Learning Project Workflow (My Version)

Data from: S1 Dataset -

Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical...

Semantic Segmentation Map Dataset (Semap)

Organization of the data

Descriptive statistics

Use and Citation

Corresponding author

Work ethics

License

Liability

References

iSDAsoil: soil extractable Sulphur for Africa predicted at 30 m resolution...

SynthRad-Faces: Synthetic Radar Dataset of Human Faces

Radar Image Dataset

Dataset Structure

Face Model parameters

Load Data

Acknowledgments