59 datasets found

P
NoW Benchmark Dataset
paperswithcode.com
Updated Sep 20, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black (2020). NoW Benchmark Dataset [Dataset]. https://paperswithcode.com/dataset/now-benchmark
Explore at:
Dataset updated
Sep 20, 2020
Authors
Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black
Description
The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions.

The dataset contains 2054 2D images of 100 subjects, captured with an iPhone X, and a separate 3D head scan for each subject. This head scan serves as ground truth for the evaluation. The subjects are selected to contain variations in age, BMI, and sex (55 female, 45 male).
w
Dataset of books called The challenge today
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The challenge today [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+challenge+today
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is The challenge today. It features 7 columns including author, publication date, language, and book publisher.
4
BPI Challenge 2017
data.4tu.nl
figshare.com
+1more
zip
Updated Feb 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boudewijn van Dongen (2017). BPI Challenge 2017 [Dataset]. http://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
Dataset updated
Feb 6, 2017
Dataset provided by
Eindhoven University of Technology
Authors
Boudewijn van Dongen
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Time period covered
Jan 1, 2016 - Feb 1, 2017
Description
This event log pertains to a loan application process of a Dutch financial institute. The data contains all applications filed trough an online system in 2016 and their subsequent events until February 1st 2017, 15:11.

The company providing the data and the process under consideration is the same as doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. However, the system supporting the process has changed in the meantime. In particular, the system now allows for multiple offers per application. These offers can be tracked through their IDs in the log.
t
The Top Challenges of Today’s CMO
thegood.com
html
Updated Jun 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Good (2021). The Top Challenges of Today’s CMO [Dataset]. https://thegood.com/insights/top-challenges-cmo/
Explore at:
htmlAvailable download formats
Dataset updated
Jun 14, 2021
Dataset authored and provided by
The Good
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The digital revolution changed the way we work, the places we work, and the tools we use at work. From the janitor’s closet to the C-suite, the once believable promise of shorter workweeks and increased leisure time morphed into information overload and more stress than ever. Nobody knows digital headaches better than a company’s Chief […]
llmail-inject-challenge
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft, llmail-inject-challenge [Dataset]. https://huggingface.co/datasets/microsoft/llmail-inject-challenge
Explore at:
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Summary

This dataset contains a large number of attack prompts collected as part of the now closed LLMail-Inject: Adaptive Prompt Injection Challenge. We first describe the details of the challenge, and then we provide a documentation of the dataset For the accompanying code, check out: https://github.com/microsoft/llmail-inject-challenge.

Citation

@article{abdelnabi2025, title = {LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/llmail-inject-challenge.
P
LHC Olympics 2020 Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LHC Olympics 2020 Dataset [Dataset]. https://paperswithcode.com/dataset/lhc-olympics-2020
Explore at:
Description
These are the official datasets for the LHC Olympics 2020 Anomaly Detection Challenge. Each "black box" contains 1M events meant to be representative of actual LHC data. These events may include signal(s) and the challenge consists of finding these signals using the method of your choice. We have uploaded a total of THREE black boxes to be used for the challenge.

In addition, we include a background sample of 1M events meant to aid in the challenge. The background sample consists of QCD dijet events simulated using Pythia8 and Delphes 3.4.1. Be warned that both the physics and the detector modeling for this simulation may not exactly reflect the "data" in the black boxes. For both background and black box data, events are selected using a single fat-jet (R=1) trigger with pT threshold of 1.2 TeV.

These events are stored as pandas dataframes saved to compressed h5 format. For each event, all reconstructed particles are assumed to be massless and are recorded in detector coordinates (pT, eta, phi). More detailed information such as particle charge is not included. Events are zero padded to constant size arrays of 700 particles. The array format is therefore (Nevents=1M, 2100).

For more information, including a complete description of the challenge and an example Jupyter notebook illustrating how to read and process the events, see the official LHC Olympics 2020 webpage here.

UPDATE: November 23, 2020

Now that the challenge is over, we have uploaded the solutions to Black Boxes 1 and 3. They are simple ASCII files (events_LHCO2020_BlackBox1.masterkey and events_LHCO2020_BlackBox3.masterkey) where each line is the truth label -- 0 for background and 1 (and 2 in the case of BB3) for signal -- of each event in the corresponding h5 files (same ordering). For more information about the solutions, please visit the LHCO2020 webpage.

UPDATE: February 11, 2021

We have uploaded the Delphes detector cards and Pythia command files used to produce the Black Box datasets.
s
Smoking NLP Challenge Data
scicrunch.org
neuinfo.org
+2more
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Smoking NLP Challenge Data [Dataset]. http://identifiers.org/RRID:SCR_008644
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008644
Dataset updated
Mar 7, 2024
Description
The data for the smoking challenge consisted exclusively of discharge summaries from Partners HealthCare which were preprocessed and converted into XML format, and separated into training and test sets. I2B2 is a data warehouse containing clinical data on over 150k patients, including outpatient DX, lab results, medications, and inpatient procedures. ETL processes authored to pull data from EMR and finance systems Institutional review boards of Partners HealthCare approved the challenge and the data preparation process. The data were annotated by pulmonologists and classified patients into Past Smokers, Current Smokers, Smokers, Non-smokers, and unknown. Second-hand smokers were considered non-smokers. Other institutions involved include Massachusetts Institute of Technology, and the State University of New York at Albany. i2b2 is a passionate advocate for the potential of existing clinical information to yield insights that can directly impact healthcare improvement. In our many use cases (Driving Biology Projects) it has become increasingly obvious that the value locked in unstructured text is essential to the success of our mission. In order to enhance the ability of natural language processing (NLP) tools to prise increasingly fine grained information from clinical records, i2b2 has previously provided sets of fully deidentified notes from the Research Patient Data Repository at Partners HealthCare for a series of NLP Challenges organized by Dr. Ozlem Uzuner. We are pleased to now make those notes available to the community for general research purposes. At this time we are releasing the notes (~1,000) from the first i2b2 Challenge as i2b2 NLP Research Data Set #1. A similar set of notes from the Second i2b2 Challenge will be released on the one year anniversary of that Challenge (November, 2010).
h
world_model_tokenized_data
huggingface.co
Updated Jun 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
1X (2024). world_model_tokenized_data [Dataset]. https://huggingface.co/datasets/1x-technologies/world_model_tokenized_data
Explore at:
Dataset updated
Jun 20, 2024
Dataset authored and provided by
1X
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
World
Description
1X World Model Compression Challenge Dataset

This repository hosts the dataset for the 1X World Model Compression Challenge. huggingface-cli download 1x-technologies/worldmodel --repo-type dataset --local-dir data

Updates Since v1.1

Train/Val v2.0 (~100 hours), replacing v1.1 Test v2.0 dataset for the Compression Challenge Faces blurred for privacy New raw video dataset (CC-BY-NC-SA 4.0) at worldmodel_raw_data Example scripts now split into: cosmos_video_decoder.py —… See the full description on the dataset page: https://huggingface.co/datasets/1x-technologies/world_model_tokenized_data.
t
How Attribution Modeling Helps Overcome Big Data Challenges
thegood.com
html
Updated Apr 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Good (2021). How Attribution Modeling Helps Overcome Big Data Challenges [Dataset]. https://thegood.com/insights/attribution-modeling/
Explore at:
htmlAvailable download formats
Dataset updated
Apr 13, 2021
Dataset authored and provided by
The Good
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Today’s marketing leaders face an unusual problem: too much information. Technology now allows marketers to get an up close look at every point along the customer journey, but making sense of that data to support growth and show marketing’s return on investment is a constant challenge. Whether your goal is to provide a stronger defense for your marketing […]
MARIO: Monitoring Age-related Macular Degeneration Progression In Optical...
zenodo.org
bin
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gwenolé Quellec; Gwenolé Quellec; Rachid Zeghlache; Rachid Zeghlache (2025). MARIO: Monitoring Age-related Macular Degeneration Progression In Optical Coherence Tomography [Dataset]. http://doi.org/10.5281/zenodo.15270469
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15270469
Dataset updated
Apr 25, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gwenolé Quellec; Gwenolé Quellec; Rachid Zeghlache; Rachid Zeghlache
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created for the MARIO challenge, held as a satellite event of the MICCAI 2024 conference.

Context

Age-related Macular Degeneration (AMD) is a progressive degeneration of the macula, the central part of the retina, affecting nearly 196 million people worldwide. It can appear from the age of 50, and more frequently from the age of 65 onwards, causing a significant weakening of visual capacities, without destroying them. It is a complex and multifactorial pathology in which genetic and environmental risk factors are intertwined. Advanced stages of the disease (atrophy and neovascularization) affect nearly 20% of patients: they are the first cause of severe visual impairment and blindness in developed countries. Since their introduction in 2007, Anti–vascular endothelial growth factor (anti-VEGF) treatments have proven their ability to slow disease progression and even improve visual function in neovascular forms of AMD. This effectiveness is optimized by ensuring a short time between the diagnosis of the pathology and the start of treatment as well as by performing regular checks and retreatment as soon as necessary. It is now widely accepted that the indication for anti-VEGF treatments is based on the presence of exudative signs (subretinal and intraretinal fluid, intraretinal hyperreflective spots, etc.) visible on optical coherence tomography (OCT), a 3-D imaging modality.The use of AI for the prediction of AMD mainly focus on the first onset of early/intermediate (iAMD), atrophic (GA), and neovascular (nAMD) stage. And there is no current work on the prediction of the development of the AMD in close monitoring for patient in anti-VEGF treatments plan. Therefore, being able to reliably detect an evolution in neovascular activity by monitoring exudative signs is crucial for the correct implementation of anti-VEGF treatment strategies, which are now individualized.

Objectives

The objective of the MARIO dataset, and of the associated challenge, is to evaluate existing and new algorithms to recognize the evolution of neovascular activity in OCT scans of patients with exudative AMD, for the purpose of improving the planning of anti-VEGF treatments.

Two tasks have been proposed:

The first task focuses on pairs of 2D slices (B-scans) from two consecutive OCT acquisitions. The goal is to classify the evolution between these two slices (before and after), which clinicians typically examine side by side on their screens.

The second task focuses on 2D slices level. The goal is to predict the future evolution within 3 months with close monitoring of patients that are enrolled in an anti-VEGF treatments plan.

See details on the https://youvenz.github.io/MARIO_challenge.github.io/">MARIO challenge webpage.
Z
Spotify Million Playlist: Recsys Challenge 2018 Dataset
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AIcrowd (2022). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6425592
Explore at:
Dataset updated
Apr 9, 2022
Dataset authored and provided by
AIcrowd
Description
Spotify Million Playlist Dataset Challenge

Summary

The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

Background

Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

Dataset

To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

Dataset Contains

1000 examples of each scenario:

Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

Download Link

Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files
d
Raw AI4Arctic Sea Ice Challenge Test Dataset
data.dtu.dk
pdf
Updated Jul 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jørgen Buus-Hinkler; Tore Wulf; Andreas Rønne Stokholm; Anton Korosov; Roberto Saldo; Leif Toudal Pedersen; David Arthurs; Rune Solberg; Nicolas Longépé; Matilde Brandt Kreiner (2023). Raw AI4Arctic Sea Ice Challenge Test Dataset [Dataset]. http://doi.org/10.11583/DTU.21762848.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.21762848.v2
Dataset updated
Jul 12, 2023
Dataset provided by
Technical University of Denmark
Authors
Jørgen Buus-Hinkler; Tore Wulf; Andreas Rønne Stokholm; Anton Korosov; Roberto Saldo; Leif Toudal Pedersen; David Arthurs; Rune Solberg; Nicolas Longépé; Matilde Brandt Kreiner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The AI4Arctic Sea Ice Challenge Datasets are produced for the AI4EO sea ice competition initiated by the European Space Agency (ESA) ɸ-lab. The purpose of the competition is to develop deep learning models to automatically produce sea ice charts including sea ice concentration, stage-of-development and floe size (form) information. The training datasets contain Sentinel-1 active microwave Synthetic Aperture Radar (SAR) data and corresponding passive MicroWave Radiometer (MWR) data from the AMSR2 satellite sensor. While SAR data has ambiguities between open water and sea ice, it has a high spatial resolution, whereas MWR data has good contrast between open water and ice. However, the coarse resolution of the AMSR2 MWR observations introduces a new set of obstacles, e.g. land spill-over, which can lead to erroneous sea ice predictions along the coastline adjacent to open water. Label data in the challenge datasets are ice charts, that have been produced by the Greenland ice service at the Danish Meteorological Institute (DMI) and the Canadian Ice Service (CIS) for the safety of navigation. The challenge datasets also contain other auxiliary data such as the distance to land and numerical weather prediction model data. The scenes are from the time period from January 8 2018 to December 21 2021. Two versions of the dataset exist, the 'raw' and 'ready-to-train'-versions with corresponding test datasets. The datasets each consist of the same 513 training and 20 test (without label data) scenes. The ‘ready-to-train’-version has been further prepared for model training, such as downsampled data from 40 to 80 m pixel spacing, standard scaled, converted ice charts (sea ice concentration, stage of development and floe size), removal of nan values, mask alignment etc. This is the testing data for the 'raw'-version. No reference data is included. Further details are described in the common manual that is published together with the datasets; “AI4Arctic_challenge-dataset-manual”. Code with a get-started toolkit for the 'ready-to-train' dataset: https://github.com/astokholm/AI4ArcticSeaIceChallenge

Version 2 has updated the files and now contains files with icecharts (previously absent) as the AutoICE Challenge has been finalised.

A quick challenge video overview of the challenge is available at: https://youtu.be/iuXIeLPyKfg This item is part of the Collection https://doi.org/10.11583/DTU.c.6244065
Micronesia Challenge: Protected Areas Network
catalog.data.gov
data.ioos.us
Updated Jan 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Nature Conservancy (TNC) Pacific Islands (Point of Contact) (2025). Micronesia Challenge: Protected Areas Network [Dataset]. https://catalog.data.gov/dataset/micronesia-challenge-protected-areas-network
Explore at:
Dataset updated
Jan 26, 2025
Dataset provided by
The Nature Conservancyhttp://www.nature.org/
Area covered
Micronesia
Description
Boundaries of all known protected areas within Micronesia as compiled by the Micronesia Challenge, a commitment launched in 2006 by Micronesian governments to strike a critical balance between the need to use their natural resources today and the need to sustain those resources for future generations. Five Micronesian governments--the Republic of Palau, the Federated States of Micronesia (FSM), the Republic of the Marshall Islands (RMI), the U.S. Territory of Guam, and the Commonwealth of the Northern Mariana Islands (CNMI)--have committed to "effectively conserve at least 30 percent of the near-shore marine resources and 20 percent of the terrestrial resources across Micronesia by 2020." This region-wide initiative evolved from local conservation projects across Micronesia and is now a large-scale partnership between governments, nonprofit and community leaders, and multinational agencies and donors. Partners include NOAA, The Nature Conservancy (TNC), Conservation International, and others. For further information, please see: http://www.micronesiachallenge.org
Data from: Solubility Challenge Revisited after Ten Years, with Multilab...
acs.figshare.com
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Llinas; Alex Avdeef (2023). Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD ∼ 0.17 log) and Loose (SD ∼ 0.62 log) Test Sets [Dataset]. http://doi.org/10.1021/acs.jcim.9b00345.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.9b00345.s001
Dataset updated
Jun 4, 2023
Dataset provided by
ACS Publications
Authors
Antonio Llinas; Alex Avdeef
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Ten years ago we issued, in conjunction with the Journal of Chemical Information and Modeling, an open prediction challenge to the cheminformatics community. Would they be able to predict the intrinsic solubilities of 32 druglike compounds using only a high-precision set of 100 compounds as a training set? The “Solubility Challenge” was a widely recognized success and spurred many discussions about the prediction methods and quality of data. Regardless of the obvious limitations of the challenge, the conclusions were somewhat unexpected. Despite contestants employing the entire spectrum of approaches available then to predict aqueous solubility and disposing of an extremely tight data set, it was not possible to identify the best methods at predicting aqueous solubility, a variety of methods and combinations all performed equally well (or badly). Several authors have suggested since then that it is not the poor quality of the solubility data which limits the accuracy of the predictions, but the deficient methods used. Now, ten years after the original Solubility Challenge, we revisit it and challenge the community to a new test with a much larger database with estimates of interlaboratory reproducibility.
HAM10000 Lesion Segmentations
kaggle.com
Updated Jul 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chdlr (2020). HAM10000 Lesion Segmentations [Dataset]. https://www.kaggle.com/tschandl/ham10000-lesion-segmentations/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2020
Dataset provided by
Kaggle
Authors
chdlr
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Context

Dermatoscopic images usually depict a single skin lesion, but large scale datasets with available segmentations of affected areas are not available until now. Challenge segmentation data often suffered from being either too coarse or too noisy. This dataset provides 10015 binary segmentation masks based on FCN-created segmentations and hand-drawn lines, which together with the HAM10000 diagnosis metadata can be used for object detection or semantic segmentation.

Content

This dataset contains binary segmentation masks as PNG-files of all HAM10000 dataset images. The area segments lesion area as evaluated by a single dermatologist (me). They were initiated with a FCN lesion segmentation model, where afterwards I went through all of them and either approved them, or corrected / redrew them with the free-hand selection tool in FIJI.

You can find the HAM10000 dataset images at the following places: - Harvard Dataverse: https://doi.org/10.7910/DVN/DBW86T - ISIC Archive Gallery: https://www.isic-archive.com - Kaggle Dataset Kernel (downsampled): https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000

Acknowledgements

If you use this data, please cite/refer to the publication I made these segmentation masks for...

Tschandl, P., Rinner, C., Apalla, Z. et al. Human–computer collaboration for skin cancer recognition. Nat Med (2020). https://doi.org/10.1038/s41591-020-0942-0

...and the original source of the images:

Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data 5, 180161 (2018). https://doi.org/10.1038/sdata.2018.161
f
DataSheet_1_Outlook for CRISPR-based tuberculosis assays now in their...
frontiersin.figshare.com
bin
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhen Huang; Guoliang Zhang; Christopher J. Lyon; Tony Y. Hu; Shuihua Lu (2023). DataSheet_1_Outlook for CRISPR-based tuberculosis assays now in their infancy.docx [Dataset]. http://doi.org/10.3389/fimmu.2023.1172035.s001
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2023.1172035.s001
Dataset updated
Aug 4, 2023
Dataset provided by
Frontiers
Authors
Zhen Huang; Guoliang Zhang; Christopher J. Lyon; Tony Y. Hu; Shuihua Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Tuberculosis (TB) remains a major underdiagnosed public health threat worldwide, being responsible for more than 10 million cases and one million deaths annually. TB diagnosis has become more rapid with the development and adoption of molecular tests, but remains challenging with traditional TB diagnosis, but there has not been a critical review of this area. Here, we systematically review these approaches to assess their diagnostic potential and issues with the development and clinical evaluation of proposed CRISPR-based TB assays. Based on these observations, we propose constructive suggestions to improve sample pretreatment, method development, clinical validation, and accessibility of these assays to streamline future assay development and validation studies.
s
Challenges / Lesson Learnt – SDG Reporting
pacific-data.sprep.org
png-data.sprep.org
pdf
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PNG Conservation and Environment Protection Authority (2025). Challenges / Lesson Learnt – SDG Reporting [Dataset]. https://pacific-data.sprep.org/dataset/challenges-lesson-learnt-sdg-reporting
Explore at:
pdfAvailable download formats
Dataset updated
Dec 2, 2025
Dataset provided by
PNG Conservation and Environment Protection Authority
License
https://pacific-data.sprep.org/resource/shared-data-license-agreementhttps://pacific-data.sprep.org/resource/shared-data-license-agreement
Area covered
Papua New Guinea
Description
It's more than a decade but the issue of reporting on MDG now SDG has become more than a challenge. 1. Extracting of Data and Overlaying Issues 2. Internal CEPA Databases
Z
Data from: Fast Calorimeter Simulation Challenge 2022 - Dataset 3
data.niaid.nih.gov
zenodo.org
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kasieczka, Gregor (2022). Fast Calorimeter Simulation Challenge 2022 - Dataset 3 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6366323
Explore at:
Dataset updated
Apr 8, 2022
Dataset provided by
Krause, Claudius
Zaborowska, Anna
Salamani, Dalila
Faucci Giannelli, Michele
Kasieczka, Gregor
Nachman, Ben
Shih, David
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is dataset 3 of the “Fast Calorimeter Simulation Challenge 2022”. It consists of four files with 50k GEANT4-simulated showers each of electrons with energies sampled from a log-uniform distribution ranging from 1 GeV to 1 TeV. The detector geometry is similar to dataset 2, but has a much higher granularity. Each of the 45 layer has now 18 radial and 50 angular bins, totalling 18x50x45=40500 voxels. This dataset was produced using the Par04 Geant4 example.

dataset_3_1.hdf5 and dataset_3_2.hdf5 should be used for training, dataset_3_3.hdf5 and dataset_3_4.hdf5 can be used as reference in the evaluation.

More details, in particular helper scripts to parse the data and calculate and visualize basic high-level physics features, are available at https://calochallenge.github.io/homepage/
f
BPI Challenge 2017 - Offer log
figshare.com
data.4tu.nl
+2more
txt
Updated Feb 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boudewijn van Dongen (2021). BPI Challenge 2017 - Offer log [Dataset]. http://doi.org/10.4121/12705737.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.4121/12705737.v2
Dataset updated
Feb 22, 2021
Dataset provided by
4TU.ResearchData
Authors
Boudewijn van Dongen
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
This event log pertains to a loan application process of a Dutch financial institute. The data contains all offers made for an accepted application in the event log 10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b. All of the events in this log are also in the BPI Challenge 2017 event log (10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b). This subset is provided for convenience and the IDs are persistent between the two datasets. Parent item: BPI Challenge 2017 This event log pertains to a loan application process of a Dutch financial institute. The data contains all applications filed trough an online system in 2016 and their subsequent events until February 1st 2017, 15:11.

The company providing the data and the process under consideration is the same as doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. However, the system supporting the process has changed in the meantime. In particular, the system now allows for multiple offers per application. These offers can be tracked through their IDs in the log.
n
GECCO Industrial Challenge 2018 Dataset: A water quality dataset for the...
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Bartz-Beielstein (2024). GECCO Industrial Challenge 2018 Dataset: A water quality dataset for the 'Internet of Things: Online Anomaly Detection for Drinking Water Quality' competition at the Genetic and Evolutionary Computation Conference 2018, Kyoto, Japan. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3884397
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Rebolledo, Margarita
Moritz, Steffen
Chandrasekaran, Sowmya
Rehbach, Frederik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of the 'Internet of Things: Online Anomaly Detection for Drinking Water Quality' competition hosted at The Genetic and Evolutionary Computation Conference (GECCO) July 15th-19th 2018, Kyoto, Japan

The task of the competition was to develop an anomaly detection algorithm for a water- and environmental data set.

Included in zenodo:

dataset of water quality data

additional material and descriptions provided for the competition

The competition was organized by:

F. Rehbach, M. Rebolledo, S. Moritz, S. Chandrasekaran, T. Bartz-Beielstein (TH Köln)

The dataset was provided by:

Thüringer Fernwasserversorgung and IMProvT research project

GECCO Industrial Challenge: 'Internet of Things: Online Anomaly Detection for Drinking Water Quality'

Description:

For the 7th time in GECCO history, the SPOTSeven Lab is hosting an industrial challenge in cooperation with various industry partners. This years challenge, based on the 2017 challenge, is held in cooperation with "Thüringer Fernwasserversorgung" which provides their real-world data set. The task of this years competition is to develop an anomaly detection algorithm for the water- and environmental data set. Early identification of anomalies in water quality data is a challenging task. It is important to identify true undesirable variations in the water quality. At the same time, false alarm rates have to be very low. Additionally to the competition, for the first time in GECCO history we are now able to provide the opportunity for all participants to submit 2-page algorithm descriptions for the GECCO Companion. Thus, it is now possible to create publications in a similar procedure to the Late Breaking Abstracts (LBAs) directly through competition participation!

Accepted Competition Entry Abstracts - Online Anomaly Detection for Drinking Water Quality Using a Multi-objective Machine Learning Approach (Victor Henrique Alves Ribeiro and Gilberto Reynoso Meza from the Pontifical Catholic University of Parana) - Anomaly Detection for Drinking Water Quality via Deep BiLSTM Ensemble (Xingguo Chen, Fan Feng, Jikai Wu, and Wenyu Liu from the Nanjing University of Posts and Telecommunications and Nanjing University) - Automatic vs. Manual Feature Engineering for Anomaly Detection of Drinking-Water Quality (Valerie Aenne Nicola Fehst from idatase GmbH)

Official webpage:

http://www.spotseven.de/gecco/gecco-challenge/gecco-challenge-2018/

Facebook

Twitter

Click to copy link

Link copied

Cite

Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black (2020). NoW Benchmark Dataset [Dataset]. https://paperswithcode.com/dataset/now-benchmark

NoW Benchmark Dataset

Explore at:

Dataset updated

Sep 20, 2020

Authors

Soubhik Sanyal; Timo Bolkart; Haiwen Feng; Michael J. Black

Description

The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions.

The dataset contains 2054 2D images of 100 subjects, captured with an iPhone X, and a separate 3D head scan for each subject. This head scan serves as ground truth for the evaluation. The subjects are selected to contain variations in age, BMI, and sex (55 female, 45 male).

Clear search

Close search

Google apps

Main menu

NoW Benchmark Dataset

Dataset of books called The challenge today

BPI Challenge 2017

The Top Challenges of Today’s CMO

llmail-inject-challenge

LHC Olympics 2020 Dataset

Smoking NLP Challenge Data

world_model_tokenized_data

How Attribution Modeling Helps Overcome Big Data Challenges

MARIO: Monitoring Age-related Macular Degeneration Progression In Optical...

Context

Objectives

Spotify Million Playlist: Recsys Challenge 2018 Dataset

Raw AI4Arctic Sea Ice Challenge Test Dataset

Micronesia Challenge: Protected Areas Network

Data from: Solubility Challenge Revisited after Ten Years, with Multilab...

HAM10000 Lesion Segmentations

Context

Content

Acknowledgements

DataSheet_1_Outlook for CRISPR-based tuberculosis assays now in their...

Challenges / Lesson Learnt – SDG Reporting

Data from: Fast Calorimeter Simulation Challenge 2022 - Dataset 3

BPI Challenge 2017 - Offer log

GECCO Industrial Challenge 2018 Dataset: A water quality dataset for the...

NoW Benchmark DatasetSee More Versions

NoW Benchmark Dataset