59 datasets found
  1. P

    NoW Benchmark Dataset

    • paperswithcode.com
    Updated Sep 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black (2020). NoW Benchmark Dataset [Dataset]. https://paperswithcode.com/dataset/now-benchmark
    Explore at:
    Dataset updated
    Sep 20, 2020
    Authors
    Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black
    Description

    The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions.

    The dataset contains 2054 2D images of 100 subjects, captured with an iPhone X, and a separate 3D head scan for each subject. This head scan serves as ground truth for the evaluation. The subjects are selected to contain variations in age, BMI, and sex (55 female, 45 male).

  2. w

    Dataset of books called The challenge today

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called The challenge today [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+challenge+today
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is The challenge today. It features 7 columns including author, publication date, language, and book publisher.

  3. 4

    BPI Challenge 2017

    • data.4tu.nl
    • figshare.com
    • +1more
    zip
    Updated Feb 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boudewijn van Dongen (2017). BPI Challenge 2017 [Dataset]. http://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 6, 2017
    Dataset provided by
    Eindhoven University of Technology
    Authors
    Boudewijn van Dongen
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Time period covered
    Jan 1, 2016 - Feb 1, 2017
    Description

    This event log pertains to a loan application process of a Dutch financial institute. The data contains all applications filed trough an online system in 2016 and their subsequent events until February 1st 2017, 15:11.

    The company providing the data and the process under consideration is the same as doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. However, the system supporting the process has changed in the meantime. In particular, the system now allows for multiple offers per application. These offers can be tracked through their IDs in the log.

  4. t

    The Top Challenges of Today’s CMO

    • thegood.com
    html
    Updated Jun 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Good (2021). The Top Challenges of Today’s CMO [Dataset]. https://thegood.com/insights/top-challenges-cmo/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 14, 2021
    Dataset authored and provided by
    The Good
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The digital revolution changed the way we work, the places we work, and the tools we use at work. From the janitor’s closet to the C-suite, the once believable promise of shorter workweeks and increased leisure time morphed into information overload and more stress than ever. Nobody knows digital headaches better than a company’s Chief […]

  5. llmail-inject-challenge

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft, llmail-inject-challenge [Dataset]. https://huggingface.co/datasets/microsoft/llmail-inject-challenge
    Explore at:
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Summary

    This dataset contains a large number of attack prompts collected as part of the now closed LLMail-Inject: Adaptive Prompt Injection Challenge. We first describe the details of the challenge, and then we provide a documentation of the dataset For the accompanying code, check out: https://github.com/microsoft/llmail-inject-challenge.

      Citation
    

    @article{abdelnabi2025, title = {LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/llmail-inject-challenge.

  6. P

    LHC Olympics 2020 Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LHC Olympics 2020 Dataset [Dataset]. https://paperswithcode.com/dataset/lhc-olympics-2020
    Explore at:
    Description

    These are the official datasets for the LHC Olympics 2020 Anomaly Detection Challenge. Each "black box" contains 1M events meant to be representative of actual LHC data. These events may include signal(s) and the challenge consists of finding these signals using the method of your choice. We have uploaded a total of THREE black boxes to be used for the challenge.

    In addition, we include a background sample of 1M events meant to aid in the challenge. The background sample consists of QCD dijet events simulated using Pythia8 and Delphes 3.4.1. Be warned that both the physics and the detector modeling for this simulation may not exactly reflect the "data" in the black boxes. For both background and black box data, events are selected using a single fat-jet (R=1) trigger with pT threshold of 1.2 TeV.

    These events are stored as pandas dataframes saved to compressed h5 format. For each event, all reconstructed particles are assumed to be massless and are recorded in detector coordinates (pT, eta, phi). More detailed information such as particle charge is not included. Events are zero padded to constant size arrays of 700 particles. The array format is therefore (Nevents=1M, 2100).

    For more information, including a complete description of the challenge and an example Jupyter notebook illustrating how to read and process the events, see the official LHC Olympics 2020 webpage here.

    UPDATE: November 23, 2020

    Now that the challenge is over, we have uploaded the solutions to Black Boxes 1 and 3. They are simple ASCII files (events_LHCO2020_BlackBox1.masterkey and events_LHCO2020_BlackBox3.masterkey) where each line is the truth label -- 0 for background and 1 (and 2 in the case of BB3) for signal -- of each event in the corresponding h5 files (same ordering). For more information about the solutions, please visit the LHCO2020 webpage.

    UPDATE: February 11, 2021

    We have uploaded the Delphes detector cards and Pythia command files used to produce the Black Box datasets.

  7. s

    Smoking NLP Challenge Data

    • scicrunch.org
    • neuinfo.org
    • +2more
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Smoking NLP Challenge Data [Dataset]. http://identifiers.org/RRID:SCR_008644
    Explore at:
    Dataset updated
    Mar 7, 2024
    Description

    The data for the smoking challenge consisted exclusively of discharge summaries from Partners HealthCare which were preprocessed and converted into XML format, and separated into training and test sets. I2B2 is a data warehouse containing clinical data on over 150k patients, including outpatient DX, lab results, medications, and inpatient procedures. ETL processes authored to pull data from EMR and finance systems Institutional review boards of Partners HealthCare approved the challenge and the data preparation process. The data were annotated by pulmonologists and classified patients into Past Smokers, Current Smokers, Smokers, Non-smokers, and unknown. Second-hand smokers were considered non-smokers. Other institutions involved include Massachusetts Institute of Technology, and the State University of New York at Albany. i2b2 is a passionate advocate for the potential of existing clinical information to yield insights that can directly impact healthcare improvement. In our many use cases (Driving Biology Projects) it has become increasingly obvious that the value locked in unstructured text is essential to the success of our mission. In order to enhance the ability of natural language processing (NLP) tools to prise increasingly fine grained information from clinical records, i2b2 has previously provided sets of fully deidentified notes from the Research Patient Data Repository at Partners HealthCare for a series of NLP Challenges organized by Dr. Ozlem Uzuner. We are pleased to now make those notes available to the community for general research purposes. At this time we are releasing the notes (~1,000) from the first i2b2 Challenge as i2b2 NLP Research Data Set #1. A similar set of notes from the Second i2b2 Challenge will be released on the one year anniversary of that Challenge (November, 2010).

  8. h

    world_model_tokenized_data

    • huggingface.co
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    1X (2024). world_model_tokenized_data [Dataset]. https://huggingface.co/datasets/1x-technologies/world_model_tokenized_data
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset authored and provided by
    1X
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    World
    Description

    1X World Model Compression Challenge Dataset

    This repository hosts the dataset for the 1X World Model Compression Challenge. huggingface-cli download 1x-technologies/worldmodel --repo-type dataset --local-dir data

      Updates Since v1.1
    

    Train/Val v2.0 (~100 hours), replacing v1.1 Test v2.0 dataset for the Compression Challenge Faces blurred for privacy New raw video dataset (CC-BY-NC-SA 4.0) at worldmodel_raw_data Example scripts now split into: cosmos_video_decoder.py —… See the full description on the dataset page: https://huggingface.co/datasets/1x-technologies/world_model_tokenized_data.

  9. t

    How Attribution Modeling Helps Overcome Big Data Challenges

    • thegood.com
    html
    Updated Apr 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Good (2021). How Attribution Modeling Helps Overcome Big Data Challenges [Dataset]. https://thegood.com/insights/attribution-modeling/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Apr 13, 2021
    Dataset authored and provided by
    The Good
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Today’s marketing leaders face an unusual problem: too much information. Technology now allows marketers to get an up close look at every point along the customer journey, but making sense of that data to support growth and show marketing’s return on investment is a constant challenge. Whether your goal is to provide a stronger defense for your marketing […]

  10. MARIO: Monitoring Age-related Macular Degeneration Progression In Optical...

    • zenodo.org
    bin
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gwenolé Quellec; Gwenolé Quellec; Rachid Zeghlache; Rachid Zeghlache (2025). MARIO: Monitoring Age-related Macular Degeneration Progression In Optical Coherence Tomography [Dataset]. http://doi.org/10.5281/zenodo.15270469
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gwenolé Quellec; Gwenolé Quellec; Rachid Zeghlache; Rachid Zeghlache
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    This dataset was created for the MARIO challenge, held as a satellite event of the MICCAI 2024 conference.

    Context

    Age-related Macular Degeneration (AMD) is a progressive degeneration of the macula, the central part of the retina, affecting nearly 196 million people worldwide. It can appear from the age of 50, and more frequently from the age of 65 onwards, causing a significant weakening of visual capacities, without destroying them. It is a complex and multifactorial pathology in which genetic and environmental risk factors are intertwined. Advanced stages of the disease (atrophy and neovascularization) affect nearly 20% of patients: they are the first cause of severe visual impairment and blindness in developed countries. Since their introduction in 2007, Anti–vascular endothelial growth factor (anti-VEGF) treatments have proven their ability to slow disease progression and even improve visual function in neovascular forms of AMD. This effectiveness is optimized by ensuring a short time between the diagnosis of the pathology and the start of treatment as well as by performing regular checks and retreatment as soon as necessary. It is now widely accepted that the indication for anti-VEGF treatments is based on the presence of exudative signs (subretinal and intraretinal fluid, intraretinal hyperreflective spots, etc.) visible on optical coherence tomography (OCT), a 3-D imaging modality.The use of AI for the prediction of AMD mainly focus on the first onset of early/intermediate (iAMD), atrophic (GA), and neovascular (nAMD) stage. And there is no current work on the prediction of the development of the AMD in close monitoring for patient in anti-VEGF treatments plan. Therefore, being able to reliably detect an evolution in neovascular activity by monitoring exudative signs is crucial for the correct implementation of anti-VEGF treatment strategies, which are now individualized.

    Objectives

    The objective of the MARIO dataset, and of the associated challenge, is to evaluate existing and new algorithms to recognize the evolution of neovascular activity in OCT scans of patients with exudative AMD, for the purpose of improving the planning of anti-VEGF treatments.

    Two tasks have been proposed:

    • The first task focuses on pairs of 2D slices (B-scans) from two consecutive OCT acquisitions. The goal is to classify the evolution between these two slices (before and after), which clinicians typically examine side by side on their screens.
    • The second task focuses on 2D slices level. The goal is to predict the future evolution within 3 months with close monitoring of patients that are enrolled in an anti-VEGF treatments plan.

    See details on the https://youvenz.github.io/MARIO_challenge.github.io/">MARIO challenge webpage.

  11. Z

    Spotify Million Playlist: Recsys Challenge 2018 Dataset

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AIcrowd (2022). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6425592
    Explore at:
    Dataset updated
    Apr 9, 2022
    Dataset authored and provided by
    AIcrowd
    Description

    Spotify Million Playlist Dataset Challenge

    Summary

    The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

    Background

    Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

    Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

    But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

    The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

    By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

    Dataset

    To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

    In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

    Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

    Dataset Contains

    1000 examples of each scenario:

    Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

    Download Link

    Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files

  12. d

    Raw AI4Arctic Sea Ice Challenge Test Dataset

    • data.dtu.dk
    pdf
    Updated Jul 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jørgen Buus-Hinkler; Tore Wulf; Andreas Rønne Stokholm; Anton Korosov; Roberto Saldo; Leif Toudal Pedersen; David Arthurs; Rune Solberg; Nicolas Longépé; Matilde Brandt Kreiner (2023). Raw AI4Arctic Sea Ice Challenge Test Dataset [Dataset]. http://doi.org/10.11583/DTU.21762848.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Jørgen Buus-Hinkler; Tore Wulf; Andreas Rønne Stokholm; Anton Korosov; Roberto Saldo; Leif Toudal Pedersen; David Arthurs; Rune Solberg; Nicolas Longépé; Matilde Brandt Kreiner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The AI4Arctic Sea Ice Challenge Datasets are produced for the AI4EO sea ice competition initiated by the European Space Agency (ESA) ɸ-lab. The purpose of the competition is to develop deep learning models to automatically produce sea ice charts including sea ice concentration, stage-of-development and floe size (form) information. The training datasets contain Sentinel-1 active microwave Synthetic Aperture Radar (SAR) data and corresponding passive MicroWave Radiometer (MWR) data from the AMSR2 satellite sensor. While SAR data has ambiguities between open water and sea ice, it has a high spatial resolution, whereas MWR data has good contrast between open water and ice. However, the coarse resolution of the AMSR2 MWR observations introduces a new set of obstacles, e.g. land spill-over, which can lead to erroneous sea ice predictions along the coastline adjacent to open water. Label data in the challenge datasets are ice charts, that have been produced by the Greenland ice service at the Danish Meteorological Institute (DMI) and the Canadian Ice Service (CIS) for the safety of navigation. The challenge datasets also contain other auxiliary data such as the distance to land and numerical weather prediction model data. The scenes are from the time period from January 8 2018 to December 21 2021. Two versions of the dataset exist, the 'raw' and 'ready-to-train'-versions with corresponding test datasets. The datasets each consist of the same 513 training and 20 test (without label data) scenes. The ‘ready-to-train’-version has been further prepared for model training, such as downsampled data from 40 to 80 m pixel spacing, standard scaled, converted ice charts (sea ice concentration, stage of development and floe size), removal of nan values, mask alignment etc. This is the testing data for the 'raw'-version. No reference data is included. Further details are described in the common manual that is published together with the datasets; “AI4Arctic_challenge-dataset-manual”. Code with a get-started toolkit for the 'ready-to-train' dataset: https://github.com/astokholm/AI4ArcticSeaIceChallenge

    Version 2 has updated the files and now contains files with icecharts (previously absent) as the AutoICE Challenge has been finalised.

    A quick challenge video overview of the challenge is available at: https://youtu.be/iuXIeLPyKfg This item is part of the Collection https://doi.org/10.11583/DTU.c.6244065

  13. Micronesia Challenge: Protected Areas Network

    • catalog.data.gov
    • data.ioos.us
    Updated Jan 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Nature Conservancy (TNC) Pacific Islands (Point of Contact) (2025). Micronesia Challenge: Protected Areas Network [Dataset]. https://catalog.data.gov/dataset/micronesia-challenge-protected-areas-network
    Explore at:
    Dataset updated
    Jan 26, 2025
    Dataset provided by
    The Nature Conservancyhttp://www.nature.org/
    Area covered
    Micronesia
    Description

    Boundaries of all known protected areas within Micronesia as compiled by the Micronesia Challenge, a commitment launched in 2006 by Micronesian governments to strike a critical balance between the need to use their natural resources today and the need to sustain those resources for future generations. Five Micronesian governments--the Republic of Palau, the Federated States of Micronesia (FSM), the Republic of the Marshall Islands (RMI), the U.S. Territory of Guam, and the Commonwealth of the Northern Mariana Islands (CNMI)--have committed to "effectively conserve at least 30 percent of the near-shore marine resources and 20 percent of the terrestrial resources across Micronesia by 2020." This region-wide initiative evolved from local conservation projects across Micronesia and is now a large-scale partnership between governments, nonprofit and community leaders, and multinational agencies and donors. Partners include NOAA, The Nature Conservancy (TNC), Conservation International, and others. For further information, please see: http://www.micronesiachallenge.org

  14. Data from: Solubility Challenge Revisited after Ten Years, with Multilab...

    • acs.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Llinas; Alex Avdeef (2023). Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD ∼ 0.17 log) and Loose (SD ∼ 0.62 log) Test Sets [Dataset]. http://doi.org/10.1021/acs.jcim.9b00345.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    ACS Publications
    Authors
    Antonio Llinas; Alex Avdeef
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Ten years ago we issued, in conjunction with the Journal of Chemical Information and Modeling, an open prediction challenge to the cheminformatics community. Would they be able to predict the intrinsic solubilities of 32 druglike compounds using only a high-precision set of 100 compounds as a training set? The “Solubility Challenge” was a widely recognized success and spurred many discussions about the prediction methods and quality of data. Regardless of the obvious limitations of the challenge, the conclusions were somewhat unexpected. Despite contestants employing the entire spectrum of approaches available then to predict aqueous solubility and disposing of an extremely tight data set, it was not possible to identify the best methods at predicting aqueous solubility, a variety of methods and combinations all performed equally well (or badly). Several authors have suggested since then that it is not the poor quality of the solubility data which limits the accuracy of the predictions, but the deficient methods used. Now, ten years after the original Solubility Challenge, we revisit it and challenge the community to a new test with a much larger database with estimates of interlaboratory reproducibility.

  15. HAM10000 Lesion Segmentations

    • kaggle.com
    Updated Jul 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chdlr (2020). HAM10000 Lesion Segmentations [Dataset]. https://www.kaggle.com/tschandl/ham10000-lesion-segmentations/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2020
    Dataset provided by
    Kaggle
    Authors
    chdlr
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Context

    Dermatoscopic images usually depict a single skin lesion, but large scale datasets with available segmentations of affected areas are not available until now. Challenge segmentation data often suffered from being either too coarse or too noisy. This dataset provides 10015 binary segmentation masks based on FCN-created segmentations and hand-drawn lines, which together with the HAM10000 diagnosis metadata can be used for object detection or semantic segmentation.

    Content

    This dataset contains binary segmentation masks as PNG-files of all HAM10000 dataset images. The area segments lesion area as evaluated by a single dermatologist (me). They were initiated with a FCN lesion segmentation model, where afterwards I went through all of them and either approved them, or corrected / redrew them with the free-hand selection tool in FIJI.

    You can find the HAM10000 dataset images at the following places: - Harvard Dataverse: https://doi.org/10.7910/DVN/DBW86T - ISIC Archive Gallery: https://www.isic-archive.com - Kaggle Dataset Kernel (downsampled): https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000

    Acknowledgements

    If you use this data, please cite/refer to the publication I made these segmentation masks for...

    ...and the original source of the images:

  16. f

    DataSheet_1_Outlook for CRISPR-based tuberculosis assays now in their...

    • frontiersin.figshare.com
    bin
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhen Huang; Guoliang Zhang; Christopher J. Lyon; Tony Y. Hu; Shuihua Lu (2023). DataSheet_1_Outlook for CRISPR-based tuberculosis assays now in their infancy.docx [Dataset]. http://doi.org/10.3389/fimmu.2023.1172035.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhen Huang; Guoliang Zhang; Christopher J. Lyon; Tony Y. Hu; Shuihua Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tuberculosis (TB) remains a major underdiagnosed public health threat worldwide, being responsible for more than 10 million cases and one million deaths annually. TB diagnosis has become more rapid with the development and adoption of molecular tests, but remains challenging with traditional TB diagnosis, but there has not been a critical review of this area. Here, we systematically review these approaches to assess their diagnostic potential and issues with the development and clinical evaluation of proposed CRISPR-based TB assays. Based on these observations, we propose constructive suggestions to improve sample pretreatment, method development, clinical validation, and accessibility of these assays to streamline future assay development and validation studies.

  17. s

    Challenges / Lesson Learnt – SDG Reporting

    • pacific-data.sprep.org
    • png-data.sprep.org
    pdf
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PNG Conservation and Environment Protection Authority (2025). Challenges / Lesson Learnt – SDG Reporting [Dataset]. https://pacific-data.sprep.org/dataset/challenges-lesson-learnt-sdg-reporting
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    PNG Conservation and Environment Protection Authority
    License

    https://pacific-data.sprep.org/resource/shared-data-license-agreementhttps://pacific-data.sprep.org/resource/shared-data-license-agreement

    Area covered
    Papua New Guinea
    Description

    It's more than a decade but the issue of reporting on MDG now SDG has become more than a challenge. 1. Extracting of Data and Overlaying Issues 2. Internal CEPA Databases

  18. Z

    Data from: Fast Calorimeter Simulation Challenge 2022 - Dataset 3

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kasieczka, Gregor (2022). Fast Calorimeter Simulation Challenge 2022 - Dataset 3 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6366323
    Explore at:
    Dataset updated
    Apr 8, 2022
    Dataset provided by
    Krause, Claudius
    Zaborowska, Anna
    Salamani, Dalila
    Faucci Giannelli, Michele
    Kasieczka, Gregor
    Nachman, Ben
    Shih, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is dataset 3 of the “Fast Calorimeter Simulation Challenge 2022”. It consists of four files with 50k GEANT4-simulated showers each of electrons with energies sampled from a log-uniform distribution ranging from 1 GeV to 1 TeV. The detector geometry is similar to dataset 2, but has a much higher granularity. Each of the 45 layer has now 18 radial and 50 angular bins, totalling 18x50x45=40500 voxels. This dataset was produced using the Par04 Geant4 example.

    dataset_3_1.hdf5 and dataset_3_2.hdf5 should be used for training, dataset_3_3.hdf5 and dataset_3_4.hdf5 can be used as reference in the evaluation.

    More details, in particular helper scripts to parse the data and calculate and visualize basic high-level physics features, are available at https://calochallenge.github.io/homepage/

  19. f

    BPI Challenge 2017 - Offer log

    • figshare.com
    • data.4tu.nl
    • +2more
    txt
    Updated Feb 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boudewijn van Dongen (2021). BPI Challenge 2017 - Offer log [Dataset]. http://doi.org/10.4121/12705737.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 22, 2021
    Dataset provided by
    4TU.ResearchData
    Authors
    Boudewijn van Dongen
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    This event log pertains to a loan application process of a Dutch financial institute. The data contains all offers made for an accepted application in the event log 10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b. All of the events in this log are also in the BPI Challenge 2017 event log (10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b). This subset is provided for convenience and the IDs are persistent between the two datasets. Parent item: BPI Challenge 2017 This event log pertains to a loan application process of a Dutch financial institute. The data contains all applications filed trough an online system in 2016 and their subsequent events until February 1st 2017, 15:11.

    The company providing the data and the process under consideration is the same as doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. However, the system supporting the process has changed in the meantime. In particular, the system now allows for multiple offers per application. These offers can be tracked through their IDs in the log.

  20. n

    GECCO Industrial Challenge 2018 Dataset: A water quality dataset for the...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Bartz-Beielstein (2024). GECCO Industrial Challenge 2018 Dataset: A water quality dataset for the 'Internet of Things: Online Anomaly Detection for Drinking Water Quality' competition at the Genetic and Evolutionary Computation Conference 2018, Kyoto, Japan. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3884397
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Rebolledo, Margarita
    Moritz, Steffen
    Chandrasekaran, Sowmya
    Rehbach, Frederik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of the 'Internet of Things: Online Anomaly Detection for Drinking Water Quality' competition hosted at The Genetic and Evolutionary Computation Conference (GECCO) July 15th-19th 2018, Kyoto, Japan

    The task of the competition was to develop an anomaly detection algorithm for a water- and environmental data set.

    Included in zenodo:

    • dataset of water quality data

    • additional material and descriptions provided for the competition

    The competition was organized by:

    F. Rehbach, M. Rebolledo, S. Moritz, S. Chandrasekaran, T. Bartz-Beielstein (TH Köln)

    The dataset was provided by:

    Thüringer Fernwasserversorgung and IMProvT research project

    GECCO Industrial Challenge: 'Internet of Things: Online Anomaly Detection for Drinking Water Quality'

    Description:

    For the 7th time in GECCO history, the SPOTSeven Lab is hosting an industrial challenge in cooperation with various industry partners. This years challenge, based on the 2017 challenge, is held in cooperation with "Thüringer Fernwasserversorgung" which provides their real-world data set. The task of this years competition is to develop an anomaly detection algorithm for the water- and environmental data set. Early identification of anomalies in water quality data is a challenging task. It is important to identify true undesirable variations in the water quality. At the same time, false alarm rates have to be very low. Additionally to the competition, for the first time in GECCO history we are now able to provide the opportunity for all participants to submit 2-page algorithm descriptions for the GECCO Companion. Thus, it is now possible to create publications in a similar procedure to the Late Breaking Abstracts (LBAs) directly through competition participation!

    Accepted Competition Entry Abstracts - Online Anomaly Detection for Drinking Water Quality Using a Multi-objective Machine Learning Approach (Victor Henrique Alves Ribeiro and Gilberto Reynoso Meza from the Pontifical Catholic University of Parana) - Anomaly Detection for Drinking Water Quality via Deep BiLSTM Ensemble (Xingguo Chen, Fan Feng, Jikai Wu, and Wenyu Liu from the Nanjing University of Posts and Telecommunications and Nanjing University) - Automatic vs. Manual Feature Engineering for Anomaly Detection of Drinking-Water Quality (Valerie Aenne Nicola Fehst from idatase GmbH)

    Official webpage:

    http://www.spotseven.de/gecco/gecco-challenge/gecco-challenge-2018/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black (2020). NoW Benchmark Dataset [Dataset]. https://paperswithcode.com/dataset/now-benchmark

NoW Benchmark Dataset

Explore at:
Dataset updated
Sep 20, 2020
Authors
Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black
Description

The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions.

The dataset contains 2054 2D images of 100 subjects, captured with an iPhone X, and a separate 3D head scan for each subject. This head scan serves as ground truth for the evaluation. The subjects are selected to contain variations in age, BMI, and sex (55 female, 45 male).

Search
Clear search
Close search
Google apps
Main menu