8 datasets found
  1. SynthRad-Faces: Synthetic Radar Dataset of Human Faces

    • zenodo.org
    bin
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Braeutigam; Valentin Braeutigam (2025). SynthRad-Faces: Synthetic Radar Dataset of Human Faces [Dataset]. http://doi.org/10.5281/zenodo.14264739
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Valentin Braeutigam; Valentin Braeutigam
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Radar Image Dataset

    Dataset Structure

    `dataset.hdf` contains the dataset of 10,000 synthetic radar images with the according parameters.
    The data for each instance is saved at the following indices:
    [000000 - 065536] : radar amplitude image (unscaled)
    [065536 - 065540] : radar image bounding box [x_min, x_max, y_min, y_max]
    [065540 - 065739] : shape parameters (199 parameters)
    [065739 - 065938] : color parameters (199 parameters)
    [065938 - 066038] : expression parameters (100 parameters)
    [066038 - 066045] : pose (scaling_factor, rotation(roll, pitch, yaw), translation(x, y, z))
    [066045 - 066061] : transformation matrix to radar coordinate system
    [066061 - 066067] : synthetic radar parameters (scaling factor, carrier frequency, delta frequency, number antennas, number samples, material factor, antenna size)
    [066067 - 131603] : radar depth image (unscaled)

    Face Model parameters

    We used the face12 mask of the Basel Face Model 2019 (contained in the file model2019_face12.h5) for the sampling of the faces. The face model can be registered for here: https://faces.dmi.unibas.ch/bfm/bfm2019.html. The scalismo face framework (https://github.com/unibas-gravis/scalismo-faces) can be used to generate the face meshes from the shape, (color), and expression parameters. Additionally, they can be transformed by applying the pose.

    Load Data

    One can load and scale the image data with the following python code:
    import h5py
    import numpy as np
    index = 0 # adjust face index
    datafile = h5py.File('dataset.hdf5', 'r')
    image = datafile['dataset_0'][index][:256*256]
    threshold = 20 # in dB
    # scale the amplitude image logarithmically
    image[math.isnan(image)] = 0
    image = 20 * np.log10(image)
    max = np.max(image)
    min = max - threshold
    image = (image - min) / (max - min)
    image[image < 0] = 0
    image.reshape((256,256))

    # the depth image is between 0.22 m and 0.58 m
    image_depth = datafile['dataset_0'][index][-256*256:]
    image_depth = image_depth.reshape((256,256))
    image_depth[image == 0] = 0.58 # ignore pixels that are ignored in the amlitude image
    image_depth = np.nan_to_num(image_depth, nan=0.58)
    image_depth[image_depth == 0] = 0.58
    image_depth = (image_depth - 0.22) / (0.58-0.22)

    # load other data (set start_index and end_index according to the data that shall be loaded)
    data = datafile['dataset_0'][index][start_index:end_index]


    Acknowledgments

    We would like to thank the Rohde & Schwarz GmbH & Co. KG (Munich, Germany) for providing the radar imaging devices and technical support that made this study possible.

  2. Big Data Certification KR

    • kaggle.com
    zip
    Updated Nov 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KIM TAE HEON (2021). Big Data Certification KR [Dataset]. https://www.kaggle.com/agileteam/bigdatacertificationkr
    Explore at:
    zip(15840 bytes)Available download formats
    Dataset updated
    Nov 29, 2021
    Authors
    KIM TAE HEON
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    빅데이터 분석기사 실기 준비 놀이터

    함께 놀아볼까요? 무궁화 꽃이 피었습니다 😜 빅데이터 분석기사 실기 준비를 위한 데이터 셋입니다. 더 좋은 코드를 만든다면 많은 공유 부탁드려요🎉 (Python과 R모두 환영합니다.)

    4회 기출 유형

    3회 기출 유형 및 심화 학습자료

    🆕 New 문제 업데이트 2022.6

    🎁 빅데이터 분식기사 실기 입문 강의 Open 🎁

    • https://class101.page.link/tp9k
    • 입문자를 위한 강의 오픈 했어요 👍
    • 파이썬-판다스-머신러닝-모의문제(작업형1,2)-꿀팁 등을 실기 준비에 필요한 내용만 친절하게 알려드려요🎉
    • 머신러닝을 해보신 분이라면 수강 할 필요 없을 것 같아요, 바로 모의 문제를 풀기 힘든 설명이 필요한 찐 입문자에게 추천드려요!

    📌작업형1 예상문제 (P:파이썬, R)

    Tasks 탭에서 문제 및 코드 확인

    📌작업형2 예상문제

    Tasks 탭에서 문제 및 코드 확인 - [3회차 기출유형 작업형2] : 여행 보험 패키지 상품 (데이터를 조금 어렵게 변경함) P: https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline

    📌6 주 완성 코스 (아래 표 참고)

    주차유형(에디터)번호
    6주 전작업형1(노트북)T1-1~5
    5주 전작업형1(노트북)T1-6~9, T1 EQ(기출),
    4주 전작업형1(스크립트), 작업형2(노트북)T1-10~13, T1.Ex, T2EQ, T2-1
    3주 전작업형1(스크립트), 작업형2(노트북)T1-14~19, T2-2~3
    2주 전작업형1(스크립트), 작업형2(스크립트)T1-20~21, T2-4~6, 복습
    1주 전작업형1, 작업형2(스크립트), 단답형T1-22~24, 모의고사, 복습, 응시환경 체험, 단답

    📌입문자를 위한 머신러닝 튜토리얼 (공유해주신 노트북 중 선정하였음👍)

    - https://www.kaggle.com/ohseokkim/t2-2-pima-indians-diabetes 작성자: @ohseokkim 😆

  3. California Housing Data (1990)

    • kaggle.com
    Updated May 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Harry Wang
    Area covered
    California
    Description

    Source

    This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

    The data is based on California Census in 1990.

    About the Data (from the book):

    "This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

    The following is the description from the book author:

    This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

    The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

    About the Data (From Luís Torgo page):

    http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

    This is a dataset obtained from the StatLib repository. Here is the included description:

    "We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

    End-to-End ML Project Steps (Chapter 2 of the book)

    1. Look at the big picture
    2. Get the data
    3. Discover and visualize the data to gain insights
    4. Prepare the data for Machine Learning algorithms
    5. Select a model and train it
    6. Fine-tune your model
    7. Present your solution
    8. Launch, monitor, and maintain your system

    The 10-Step Machine Learning Project Workflow (My Version)

    1. Define business object
    2. Make sense of the data from a high level
      • data types (number, text, object, etc.)
      • continuous/discrete
      • basic stats (min, max, std, median, etc.) using boxplot
      • frequency via histogram
      • scales and distributions of different features
    3. Create the traning and test sets using proper sampling methods, e.g., random vs. stratified
    4. Correlation analysis (pair-wise and attribute combinations)
    5. Data cleaning (missing data, outliers, data errors)
    6. Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)
    7. Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)
    8. Fine tune the model using trying different combinations of hyperparameters
    9. Evaluate the model with best estimators in the test set
    10. Launch, monitor, and refresh the model and system
  4. A

    ‘California Housing Data (1990)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Data (1990)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-data-1990-a0c5/b7389540/?iid=007-628&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    Analysis of ‘California Housing Data (1990)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/housing on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Source

    This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

    The data is based on California Census in 1990.

    About the Data (from the book):

    "This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

    The following is the description from the book author:

    This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

    The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

    About the Data (From Luís Torgo page):

    http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

    This is a dataset obtained from the StatLib repository. Here is the included description:

    "We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

    End-to-End ML Project Steps (Chapter 2 of the book)

    1. Look at the big picture
    2. Get the data
    3. Discover and visualize the data to gain insights
    4. Prepare the data for Machine Learning algorithms
    5. Select a model and train it
    6. Fine-tune your model
    7. Present your solution
    8. Launch, monitor, and maintain your system

    The 10-Step Machine Learning Project Workflow (My Version)

    1. Define business object
    2. Make sense of the data from a high level
      • data types (number, text, object, etc.)
      • continuous/discrete
      • basic stats (min, max, std, median, etc.) using boxplot
      • frequency via histogram
      • scales and distributions of different features
    3. Create the traning and test sets using proper sampling methods, e.g., random vs. stratified
    4. Correlation analysis (pair-wise and attribute combinations)
    5. Data cleaning (missing data, outliers, data errors)
    6. Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)
    7. Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)
    8. Fine tune the model using trying different combinations of hyperparameters
    9. Evaluate the model with best estimators in the test set
    10. Launch, monitor, and refresh the model and system

    --- Original source retains full ownership of the source dataset ---

  5. f

    Data from: S1 Dataset -

    • figshare.com
    xlsx
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maghsoud Nabilpour; Mohammad Hossein Samanipour; Nicola Luigi Bragazzi; Monoem Haddad; Tomás Herrera-Valenzuela; Dan Tao; Julien S. Baker; Jožef Šimenko (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0288227.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Maghsoud Nabilpour; Mohammad Hossein Samanipour; Nicola Luigi Bragazzi; Monoem Haddad; Tomás Herrera-Valenzuela; Dan Tao; Julien S. Baker; Jožef Šimenko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigated the relationship between psychological skills and fitness levels among elite taekwondo athletes. A total of ten Iranian male elite taekwondo athletes (mean age of 20.6±2 years, BMI 18.78±0.62 kg/m2, and fat percentage of 8.87±1.46%) participated in the study. The Sports Emotional Intelligence Questionnaire, Sports Success Scale, Sport Mental Toughness Questionnaire, and Mindfulness Inventory for Sport were used to assess psychological factors. The Wingate test was used to determine anaerobic power, and the Bruce test to determine aerobic fitness. Descriptive statistics and Spearman rank correlation coefficients were utilised to examine any relationships between subscales. Statistically significant correlations were recorded between the evaluation of feelings (EI scale) and VO2peak (ml/kg/min) (r = -0.70, p = 0.0235) and between social skills (EI scale) and relative peak power (W/kg) (r = 0.84, p = 0.0026). Also, between optimism (EI scale) and VO2peak (ml/kg/min) (r = -0.70, p = 0.0252) and between optimism (EI scale) and HR-MAX (r = -0.75, p = 0.0123); and, finally, between control (mental toughness scale) and relative peak power (W/kg) (r = 0.67, p = 0.0360). These findings demonstrate relationships between psychological factors and the advantages of good anaerobic and aerobic capabilities. Finally, the study also demonstrated that elite taekwondo athletes have high mental performance abilities that are interrelated with anaerobic and aerobic performance.

  6. H

    Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Dec 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Peter; Joseph Messina; Nishani Moragoda (2021). Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical (1985–2005) and RCP 4.5 and RCP 8.5 (2006–2080) [Dataset]. http://doi.org/10.7910/DVN/ZNEJMS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Brad Peter; Joseph Messina; Nishani Moragoda
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Monthly Aggregated NEX-GDDP Ensemble Climate Projections: Historical (1985–2005) and RCP 4.5 and RCP 8.5 (2006–2080) This dataset is a monthly-scale aggregation of the NEX-GDDP: NASA Earth Exchange Global Daily Downscaled Climate Projections processed using Google Earth Engine (Gorelick 2017). The native delivery on Google Earth Engine is at the daily timescale for each individual CMIP5 GCM model. This dataset was created to facilitate use of NEX-GDDP and reduce processing times for projects that seek an ensemble model with a coarser temporal resolution. The aggregated data have been made available in Google Earth Engine via 'users/cartoscience/GCM_NASA-NEX-GDDP/NEX-GDDP-PRODUCT-ID_Ensemble-Monthly_YEAR' (see code below on how to access), and all 171 GeoTIFFS have been uploaded to this dataverse entry. Relevant links: https://www.nasa.gov/nex https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp https://esgf.nccs.nasa.gov/esgdoc/NEX-GDDP_Tech_Note_v0.pdf https://developers.google.com/earth-engine/datasets/catalog/NASA_NEX-GDDP https://journals.ametsoc.org/view/journals/bams/93/4/bams-d-11-00094.1.xml https://rd.springer.com/article/10.1007/s10584-011-0156-z#page-1 The dataset can be accessed within Google Earth Engine using the following code: var histYears = ee.List.sequence(1985,2005).getInfo() var rcpYears = ee.List.sequence(2006,2080).getInfo() var path1 = 'users/cartoscience/GCM_NASA-NEX-GDDP/NEX-GDDP-' var path2 = '_Ensemble-Monthly_' var product product = 'Hist' var hist = ee.ImageCollection( histYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) product = 'RCP45' var rcp45 = ee.ImageCollection( rcpYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) product = 'RCP85' var rcp85 = ee.ImageCollection( rcpYears.map(function(y) { return ee.Image(path1+product+path2+y) }) ) print( 'Hist (1985–2005)', hist, 'RCP45 (2006–2080)', rcp45, 'RCP85 (2006–2080)', rcp85 ) var first = hist.first() var tMax = first.select('tasmin_1') var tMin = first.select('tasmax_1') var tMean = first.select('tmean_1') var pSum = first.select('pr_1') Map.addLayer(tMax, {min: -10, max: 40}, 'Average min temperature Jan 1985 (Hist)', false) Map.addLayer(tMin, {min: 10, max: 40}, 'Average max temperature Jan 1985 (Hist)', false) Map.addLayer(tMean, {min: 10, max: 40}, 'Average temperature Jan 1985 (Hist)', false) Map.addLayer(pSum, {min: 10, max: 500}, 'Accumulated rainfall Jan 1985 (Hist)', true) https://code.earthengine.google.com/5bfd9741274679dded7a95d1b57ca51d Ensemble average based on the following models: ACCESS1-0,BNU-ESM,CCSM4,CESM1-BGC,CNRM-CM5, CSIRO-Mk3-6-0,CanESM2,GFDL-CM3,GFDL-ESM2G, GFDL-ESM2M,IPSL-CM5A-LR,IPSL-CM5A-MR,MIROC-ESM-CHEM, MIROC-ESM,MIROC5,MPI-ESM-LR,MPI-ESM-MR,MRI-CGCM3, NorESM1-M,bcc-csm1-1,inmcm4 Each annual GeoTIFF contains 48 bands (4 variables across 12 months)— Temperature: Monthly mean (tasmin, tasmax, tmean) Precipitation: Monthly sum (pr) Bands 1–48 correspond with: tasmin_1, tasmax_1, tmean_1, pr_1, tasmin_2, tasmax_2, tmean_2, pr_2, tasmin_3, tasmax_3, tmean_3, pr_3, tasmin_4, tasmax_4, tmean_4, pr_4, tasmin_5, tasmax_5, tmean_5, pr_5, tasmin_6, tasmax_6, tmean_6, pr_6, tasmin_7, tasmax_7, tmean_7, pr_7, tasmin_8, tasmax_8, tmean_8, pr_8, tasmin_9, tasmax_9, tmean_9, pr_9, tasmin_10, tasmax_10, tmean_10, pr_10, tasmin_11, tasmax_11, tmean_11, pr_11, tasmin_12, tasmax_12, tmean_12, pr_12 *Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D. and Moore, R., 2017. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, pp.18–27. Project information: SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)

  7. z

    Semantic Segmentation Map Dataset (Semap)

    • zenodo.org
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Remi Petitpierre; Remi Petitpierre; Damien Gomez Donoso; Ben Kriesel; Ben Kriesel; Damien Gomez Donoso (2025). Semantic Segmentation Map Dataset (Semap) [Dataset]. http://doi.org/10.5281/zenodo.16164782
    Explore at:
    Dataset updated
    Sep 1, 2025
    Dataset provided by
    EPFL
    Authors
    Remi Petitpierre; Remi Petitpierre; Damien Gomez Donoso; Ben Kriesel; Ben Kriesel; Damien Gomez Donoso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 1, 2025
    Description

    <<< This dataset is not released yet. Release date: 1st September, 2025. >>>

    The Semantic Segmentation Map Dataset (Semap) contains 1,439 manually annotated map samples. Specifically, the dataset compiles 356 image patches from the Historical City Maps Semantic Segmentation Dataset (HCMSSD, [1]), 78 samples extracted from 19th century European cadastres [2–4], three from Paris city atlases [5], and 1,002 newly annotated samples, drawn from the Aggregated Dataset on the History of Cartography (ADHOC Images, [6]).

    Additionally, it comprises 12,122 synthetically generated image samples and related labels.

    Both datasets are part of the R. Petitpierre's PhD thesis [7]. Extensive details on annotation, and synthetical generation procedures are provided in the context of that work.

    Organization of the data

    To come soon.

    Descriptive statistics

    Number of semantic classes: 5 + background
    Number of manually annotated image samples: 1,439
    Number of synthetically-generated samples:
    Image sample size:
    min: 768 × 768 pixels
    max: 1000 × 1000 pixels

    Use and Citation

    For any mention of this dataset, please cite :

    @misc{semap_petitpierre_2025,
    author = {Petitpierre, R{\'{e}}mi and Gomez Donoso, Damien and Kriesel, Ben},
    title = {{Semantic Segmentation Map Dataset (Semap)}},
    year = {2025},
    publisher = {EPFL},
    url = {https://doi.org/10.5281/zenodo.16164782}}


    @phdthesis{studying_maps_petitpierre_2025,
    author = {Petitpierre, R{\'{e}}mi},
    title = {{Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration}},
    year = {2025},
    school = {EPFL}}

    Corresponding author

    Rémi PETITPIERRE - remi.petitpierre@epfl.ch - ORCID - Github - Scholar - ResearchGate

    Work ethics

    80% of the data were annotated by RP. The remainder were annotated by DGD and BK, two master's students from EPFL, Switzerland. The students were paid for their work using public funding, and were offered the possibility to be associated with the publication of the data.

    License

    This project is licensed under the CC BY 4.0 License.

    Liability

    We do not assume any liability for the use of this dataset.

    References

    1. Petitpierre, R. (2021). Historical City Maps Semantic Segmentation Dataset. V1.0. https://doi.org/10.5281/zenodo.5513639
    2. di Lenardo I, Barman R, Pardini F, et al. (2021) Une approche computationnelle du cadastre napoléonien de Venise. Humanités numériques 3.
    3. Petitpierre R, Rappo L and di Lenardo I (2023) Recartographier l’espace napoléonien. In: Humanistica 2023, Genève, Switzerland, June 2023. Géographie. Association francophone des humanités numériques. Available at: https://hal.science/hal-04109214.
    4. Li S, Cerioni A, Herny C, et al. (2024) Vectorization of historical cadastral plans from the 1850s in the Canton of Geneva. Geneva, Switzerland: Swiss Territorial Data Lab. Available at: https://tech.stdl.ch/PROJ-CADMAP/.
    5. Chazalon J, Carlinet E, Chen Y, et al. (2021) ICDAR 2021 Competition on Historical Map Segmentation. arXiv:2105.13265 [cs].
    6. To come soon
    7. Petitpierre R (2025) Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration. PhD thesis. École Polytechnique Fédérale de Lausanne.
  8. l

    iSDAsoil: soil extractable Sulphur for Africa predicted at 30 m resolution...

    • kenya.lsc-hubs.org
    • lschub.kalro.org
    • +2more
    Updated Feb 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). iSDAsoil: soil extractable Sulphur for Africa predicted at 30 m resolution at 0-20 and 20-50 cm depths [Dataset]. https://kenya.lsc-hubs.org/cat/collections/metadata:main/items/10.5281-zenodo.4091142
    Explore at:
    Dataset updated
    Feb 5, 2024
    Description

    iSDAsoil dataset soil extractable Sulphur (S) log-transformed predicted at 30 m resolution for 0–20 and 20–50 cm depth intervals. Data has been projected in WGS84 coordinate system and compiled as COG. Predictions have been generated using multi-scale Ensemble Machine Learning with 250 m (MODIS, PROBA-V, climatic variables and similar) and 30 m (DTM derivatives, Landsat, Sentinel-2 and similar) resolution covariates. For model training we use a pan-African compilations of soil samples and profiles (iSDA points, AfSPDB, and other national and regional soil datasets). Cite as: Hengl, T., Miller, M.A.E., Križan, J. et al. African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Sci Rep 11, 6130 (2021). https://doi.org/10.1038/s41598-021-85639-y To open the maps in QGIS and/or directly compute with them, please use the Cloud-Optimized GeoTIFF version. Layer description: sol_log.s_mehlich3_m_30m_*..*cm_2001..2017_v0.13_wgs84.tif = predicted soil extractable Sulphur mean value, sol_log.s_mehlich3_md_30m_*..*cm_2001..2017_v0.13_wgs84.tif = predicted soil extractable Sulphur model (prediction) errors, Model errors were derived using bootstrapping: md is derived as standard deviation of individual learners from 5-fold cross-validation (using spatial blocking). The model 5-fold cross-validation (mlr::makeStackedLearner) for this variable indicates:

    Variable: log.s_mehlich3 R-square: 0.548 Fitted values sd: 0.423 RMSE: 0.384 Random forest model: Call: stats::lm(formula = f, data = d) Residuals: Min 1Q Median 3Q Max -2.5729 -0.2102 -0.0264 0.1694 5.0049 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.459208 4.154229 0.351 0.725 regr.ranger 0.937179 0.016167 57.967 < 2e-16 *** regr.xgboost 0.002587 0.016252 0.159 0.874 regr.cubist 0.145396 0.010890 13.351 < 2e-16 *** regr.nnet -0.672062 1.796642 -0.374 0.708 regr.cvglmnet -0.045157 0.011256 -4.012 6.04e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3841 on 37530 degrees of freedom Multiple R-squared: 0.5481, Adjusted R-squared: 0.548 F-statistic: 9103 on 5 and 37530 DF, p-value: < 2.2e-16
    To back-transform values (y) to ppm use the following formula:
    ppm = expm1( y / 10 )
    To submit an issue or request support please visit https://isda-africa.com/isdasoil

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Valentin Braeutigam; Valentin Braeutigam (2025). SynthRad-Faces: Synthetic Radar Dataset of Human Faces [Dataset]. http://doi.org/10.5281/zenodo.14264739
Organization logo

SynthRad-Faces: Synthetic Radar Dataset of Human Faces

Explore at:
binAvailable download formats
Dataset updated
Jan 21, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Valentin Braeutigam; Valentin Braeutigam
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Radar Image Dataset

Dataset Structure

`dataset.hdf` contains the dataset of 10,000 synthetic radar images with the according parameters.
The data for each instance is saved at the following indices:
[000000 - 065536] : radar amplitude image (unscaled)
[065536 - 065540] : radar image bounding box [x_min, x_max, y_min, y_max]
[065540 - 065739] : shape parameters (199 parameters)
[065739 - 065938] : color parameters (199 parameters)
[065938 - 066038] : expression parameters (100 parameters)
[066038 - 066045] : pose (scaling_factor, rotation(roll, pitch, yaw), translation(x, y, z))
[066045 - 066061] : transformation matrix to radar coordinate system
[066061 - 066067] : synthetic radar parameters (scaling factor, carrier frequency, delta frequency, number antennas, number samples, material factor, antenna size)
[066067 - 131603] : radar depth image (unscaled)

Face Model parameters

We used the face12 mask of the Basel Face Model 2019 (contained in the file model2019_face12.h5) for the sampling of the faces. The face model can be registered for here: https://faces.dmi.unibas.ch/bfm/bfm2019.html. The scalismo face framework (https://github.com/unibas-gravis/scalismo-faces) can be used to generate the face meshes from the shape, (color), and expression parameters. Additionally, they can be transformed by applying the pose.

Load Data

One can load and scale the image data with the following python code:
import h5py
import numpy as np
index = 0 # adjust face index
datafile = h5py.File('dataset.hdf5', 'r')
image = datafile['dataset_0'][index][:256*256]
threshold = 20 # in dB
# scale the amplitude image logarithmically
image[math.isnan(image)] = 0
image = 20 * np.log10(image)
max = np.max(image)
min = max - threshold
image = (image - min) / (max - min)
image[image < 0] = 0
image.reshape((256,256))

# the depth image is between 0.22 m and 0.58 m
image_depth = datafile['dataset_0'][index][-256*256:]
image_depth = image_depth.reshape((256,256))
image_depth[image == 0] = 0.58 # ignore pixels that are ignored in the amlitude image
image_depth = np.nan_to_num(image_depth, nan=0.58)
image_depth[image_depth == 0] = 0.58
image_depth = (image_depth - 0.22) / (0.58-0.22)

# load other data (set start_index and end_index according to the data that shall be loaded)
data = datafile['dataset_0'][index][start_index:end_index]


Acknowledgments

We would like to thank the Rohde & Schwarz GmbH & Co. KG (Munich, Germany) for providing the radar imaging devices and technical support that made this study possible.

Search
Clear search
Close search
Google apps
Main menu