3 datasets found

s
Smoking NLP Challenge Data
scicrunch.org
neuinfo.org
+2more
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Smoking NLP Challenge Data [Dataset]. http://identifiers.org/RRID:SCR_008644
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008644
Dataset updated
Mar 7, 2024
Description
The data for the smoking challenge consisted exclusively of discharge summaries from Partners HealthCare which were preprocessed and converted into XML format, and separated into training and test sets. I2B2 is a data warehouse containing clinical data on over 150k patients, including outpatient DX, lab results, medications, and inpatient procedures. ETL processes authored to pull data from EMR and finance systems Institutional review boards of Partners HealthCare approved the challenge and the data preparation process. The data were annotated by pulmonologists and classified patients into Past Smokers, Current Smokers, Smokers, Non-smokers, and unknown. Second-hand smokers were considered non-smokers. Other institutions involved include Massachusetts Institute of Technology, and the State University of New York at Albany. i2b2 is a passionate advocate for the potential of existing clinical information to yield insights that can directly impact healthcare improvement. In our many use cases (Driving Biology Projects) it has become increasingly obvious that the value locked in unstructured text is essential to the success of our mission. In order to enhance the ability of natural language processing (NLP) tools to prise increasingly fine grained information from clinical records, i2b2 has previously provided sets of fully deidentified notes from the Research Patient Data Repository at Partners HealthCare for a series of NLP Challenges organized by Dr. Ozlem Uzuner. We are pleased to now make those notes available to the community for general research purposes. At this time we are releasing the notes (~1,000) from the first i2b2 Challenge as i2b2 NLP Research Data Set #1. A similar set of notes from the Second i2b2 Challenge will be released on the one year anniversary of that Challenge (November, 2010).
Z
Hand Washing Video Dataset Annotated According to the World Health...
data.niaid.nih.gov
zenodo.org
Updated Jan 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zemlanuhina, Olga (2022). Hand Washing Video Dataset Annotated According to the World Health Organization's Handwashing Guidelines - METC Subset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5808788
Explore at:
Dataset updated
Jan 3, 2022
Dataset provided by
Elsts, Atis
Sabelnikovs, Olegs
Melbārde-Kelmere, Agita
Lulla, Martins
Slavinska, Andreta
Rutkovskis, Aleksejs
Zemlanuhina, Olga
Ivanovs, Maksims
Vilde, Aija
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview: This is a lab-based dataset with videos recording volunteers (medical students) washing their hands as part of a hand-washing monitoring and feedback experiment. The dataset is collected in the Medical Education Technology Center (METC) of Riga Stradins University, Riga, Latvia. In total, 72 participants took part in the experiments, each washing their hands three times, in a randomized order, going through three different hand-washing feedback approaches (user interfaces of a mobile app). The data was annotated in real time by a human operator, in order to give the experiment participants real-time feedback on their performance. There are 212 hand washing episodes in total, each of which is annotated by a single person. The annotations classify the washing movements according to the World Health Organization's (WHO) guidelines by marking each frame in each video with a certain movement code.

This dataset is part on three dataset series all following the same format:

https://zenodo.org/record/4537209 - data collected in Pauls Stradins Clinical University Hospital

https://zenodo.org/record/5808764 - data collected in Jurmala Hospital

https://zenodo.org/record/5808789 - data collected in the Medical Education Technology Center (METC) of Riga Stradins University

Note #1: we recommend that when using this dataset for machine learning, allowances are made for the reaction speed of the human operator labeling the data. For example, the annotations can be expected to be incorrect a short while after the person in the video switches their washing movements.

Application: The intention of this dataset is to serve as a basis for training machine learning classifiers for automated hand washing movement recognition and quality control.

Statistics:

Frame rate: ~16 FPS (slightly variable, as the video are reconstructed from a sequence of jpg images taken with max framerate supported by the capturing devices).

Resolution: 640x480

Number of videos: 212

Number of annotation files: 212

Movement codes (in JSON files):

1: Hand washing movement — Palm to palm

2: Hand washing movement — Palm over dorsum, fingers interlaced

3: Hand washing movement — Palm to palm, fingers interlaced

4: Hand washing movement — Backs of fingers to opposing palm, fingers interlocked

5: Hand washing movement — Rotational rubbing of the thumb

6: Hand washing movement — Fingertips to palm

0: Other hand washing movement

Note #2: The original dataset of JPG images is available upon request. There are 13 annotation classes in the original dataset: for each of the six washing movements defined by the WHO, "correct" and "incorrect" execution is market with two different labels. In this published dataset, all incorrect executions are marked with code 0, as "other" washing movement.

Acknowledgments: The dataset collection was funded by the Latvian Council of Science project: "Automated hand washing quality control and quality evaluation system with real-time feedback", No: lzp - Nr. 2020/2-0309.

References: For more detailed information, see this article, describing a similar dataset collected in a different project:

M. Lulla, A. Rutkovskis, A. Slavinska, A. Vilde, A. Gromova, M. Ivanovs, A. Skadins, R. Kadikis, A. Elsts. Hand-Washing Video Dataset Annotated According to the World Health Organization’s Hand-Washing Guidelines. Data. 2021; 6(4):38. https://doi.org/10.3390/data6040038

Contact information: atis.elsts@edi.lv
Multi-organ Abdominal CT Reference Standard Segmentations
zenodo.org
data.niaid.nih.gov
application/gzip, csv +1
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eli Gibson; Eli Gibson; Francesco Giganti; Francesco Giganti; Yipeng Hu; Yipeng Hu; Ester Bonmati; Ester Bonmati; Steve Bandula; Steve Bandula; Kurinchi Gurusamy; Kurinchi Gurusamy; Brian Davidson; Brian Davidson; Stephen P. Pereira; Stephen P. Pereira; Matthew J. Clarkson; Matthew J. Clarkson; Dean C. Barratt; Dean C. Barratt (2020). Multi-organ Abdominal CT Reference Standard Segmentations [Dataset]. http://doi.org/10.5281/zenodo.1169361
Explore at:
csv, application/gzip, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1169361
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eli Gibson; Eli Gibson; Francesco Giganti; Francesco Giganti; Yipeng Hu; Yipeng Hu; Ester Bonmati; Ester Bonmati; Steve Bandula; Steve Bandula; Kurinchi Gurusamy; Kurinchi Gurusamy; Brian Davidson; Brian Davidson; Stephen P. Pereira; Stephen P. Pereira; Matthew J. Clarkson; Matthew J. Clarkson; Dean C. Barratt; Dean C. Barratt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DenseVNet Multi-organ Segmentation on Abdominal CT

This dataset includes the multi-organ abdominal CT reference segmentations publicly released in conjunction with the IEEE Transactions on Medical Imaging paper "Automatic Multi-organ Segmentation on Abdominal CT with Dense V-networks" [1].

The data comprises reference segmentations for 90 abdominal CT images delineating multiple organs: the spleen, left kidney, gallbladder, esophagus, liver, stomach, pancreas and duodenum.

The abdominal CT images and some of the reference segmentations were drawn from two data sets: The Cancer Image Archive (TCIA) Pancreas-CT data set [2-4] and the Beyond the Cranial Vault (BTCV) Abdomen data set [5-6]. The Pancreas-CT data set comprises abdominal CT acquired at the National Institutes of Health Clinical Center from pre-nephrectomy healthy kidney donors or patients with neither major abdominal pathologies nor pancreatic cancer lesions. Segmentations of the pancreas are included with this data set; images were manually labeled slice-by-slice by a medical student, and verified/modified by an experienced radiologist. The BTCV data set comprises abdominal CT acquired at the Vanderbilt University Medical Center from metastatic liver cancer patients or post-operative ventral hernia patients. Segmentations of the spleen, right and left kidney, gallbladder, esophagus, liver, stomach, aorta, inferior vena cava, portal vein and splenic vein, pancreas, right adrenal gland, left adrenal gland are included in this data set; images were manually labeled by two experienced undergraduate students, and verified by a radiologist on a volumetric basis using the MIPAV software.

Segmentations that were not present in the original data sets were performed interactively using Matlab 2015b and ITK-SNAP 3.2 by an image research fellow under the supervision of a board-certified radiologist with 8 years of experience in gastrointestinal CT and MRI image interpretation. Segmentations that were present in the original data sets were edited to ensure a consistent segmentation protocol across the data set.

Terms of use

The terms of use of this data set include the terms of use of both the TCIA Pancreas-CT data set (see tabs for data links and terms of use) and the Beyond the Cranial Vault (BTCV) Abdomen data set (terms of use; after registration, you can access the data). If you use these reference segmentations, please cite the above manuscript and the references below. Because these data include manual segmentations of images from the Beyond the Cranial Vault challenge test data, they may not be used to develop submissions for the challenge.

References

[1] Gibson E, Giganti F, Hu Y, Bonmati E, Bandula S, Gurusamy K, Davidson B, Pereira SP, Clarkson MJ, Barratt DC. Automatic multi-organ segmentation on abdominal CT with dense v-networks. IEEE Transactions on Medical Imaging, 2018.

[2] Roth HR, Farag A, Turkbey EB, Lu L, Liu J, and Summers RM. (2016). Data From Pancreas-CT. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU

[3] Roth HR, Lu L, Farag A, Shin H-C, Liu J, Turkbey EB, Summers RM. DeepOrgan: Multi-level Deep Convolutional Networks for Automated Pancreas Segmentation. N. Navab et al. (Eds.): MICCAI 2015, Part I, LNCS 9349, pp. 556–564, 2015. http://arxiv.org/pdf/1506.06448.pdf

[4] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. http://doi.org/10.1007/s10278-013-9622-7

[5] Xu Z, Lee CP, Heinrich MP, Modat M, Rueckert D, Ourselin S, Abramson RG, and Landman BA, "Evaluation of six registration methods for the human abdomen on clinically acquired CT," IEEE Trans. Biomed. Eng., vol. 63, no. 8, pp. 1563–1572, 2016.http://doi.org/10.1109/TBME.2016.2574816

[6] Landman BA, Xu Z, Igelsias JE, Styner M, Langerak TR, and Klein A, "MICCAI multi-atlas labeling beyond the cranial vault - workshop and challenge," 2015, https://doi.org/10.7303/syn3193805

File format Labels are in NIfTI format with the following label definitions. Labels marked with * are only available in the BTCV data set.

spleen

right kidney*

left kidney

gallbladder

esophagus

liver

stomach

aorta*

inferior vena cava*

portal vein and splenic vein*

pancreas

right adrenal gland*

left adrenal gland*

duodenum

Subjects included in the dataset

The data comprises segmentation volumes for 90 cases, and the cropping coordinates (cropping.csv) used in the manuscript. The abdominal CT can be obtained from the links above. The reference standard segmentations may be incomplete outside of the specified cropping region. The cases are listed by their subject identifiers in their original data set:

\(\begin{bmatrix} 1 & TCIA & Pancreas-CT & 0002\\ 2 & TCIA & Pancreas-CT & 0003\\ 3 & TCIA & Pancreas-CT & 0004\\ 4 & TCIA & Pancreas-CT & 0005\\ 5 & TCIA & Pancreas-CT & 0006\\ 6 & TCIA & Pancreas-CT & 0007\\ 7 & TCIA & Pancreas-CT & 0008\\ 8 & TCIA & Pancreas-CT & 0009\\ 9 & TCIA & Pancreas-CT & 0010\\ 10 & TCIA & Pancreas-CT & 0011\\ 11 & TCIA & Pancreas-CT & 0012\\ 12 & TCIA & Pancreas-CT & 0013\\ 13 & TCIA & Pancreas-CT & 0014\\ 14 & TCIA & Pancreas-CT & 0016\\ 15 & TCIA & Pancreas-CT & 0017\\ 16 & TCIA & Pancreas-CT & 0018\\ 17 & TCIA & Pancreas-CT & 0019\\ 18 & TCIA & Pancreas-CT & 0020\\ 19 & TCIA & Pancreas-CT & 0021\\ 20 & TCIA & Pancreas-CT & 0022\\ 21 & TCIA & Pancreas-CT & 0024\\ 22 & TCIA & Pancreas-CT & 0025\\ 23 & TCIA & Pancreas-CT & 0026\\ 24 & TCIA & Pancreas-CT & 0027\\ 25 & TCIA & Pancreas-CT & 0028\\ 26 & TCIA & Pancreas-CT & 0029\\ 27 & TCIA & Pancreas-CT & 0030\\ 28 & TCIA & Pancreas-CT & 0031\\ 29 & TCIA & Pancreas-CT & 0032\\ 30 & TCIA & Pancreas-CT & 0033\\ 31 & TCIA & Pancreas-CT & 0034\\ 32 & TCIA & Pancreas-CT & 0035\\ 33 & TCIA & Pancreas-CT & 0038\\ 34 & TCIA & Pancreas-CT & 0039\\ 35 & TCIA & Pancreas-CT & 0040\\ 36 & TCIA & Pancreas-CT & 0041\\ 37 & TCIA & Pancreas-CT & 0042\\ 38 & TCIA & Pancreas-CT & 0043\\ 39 & TCIA & Pancreas-CT & 0044\\ 40 & TCIA & Pancreas-CT & 0045\\ 41 & TCIA & Pancreas-CT & 0046\\ 42 & TCIA & Pancreas-CT & 0047\\ 43 & TCIA & Pancreas-CT & 0048\\ 44 & Synapse & BeyondTheCranialVault & 0001\\ 45 & Synapse & BeyondTheCranialVault & 0002\\ 46 & Synapse & BeyondTheCranialVault & 0003\\ 47 & Synapse & BeyondTheCranialVault & 0004\\ 48 & Synapse & BeyondTheCranialVault & 0005\\ 49 & Synapse & BeyondTheCranialVault & 0006\\ 50 & Synapse & BeyondTheCranialVault & 0007\\ 51 & Synapse & BeyondTheCranialVault & 0008\\ 52 & Synapse & BeyondTheCranialVault & 0009\\ 53 & Synapse & BeyondTheCranialVault & 0010\\ 54 & Synapse & BeyondTheCranialVault & 0021\\ 55 & Synapse & BeyondTheCranialVault & 0022\\ 56 & Synapse & BeyondTheCranialVault & 0023\\ 57 & Synapse & BeyondTheCranialVault & 0024\\ 58 & Synapse & BeyondTheCranialVault & 0025\\ 59 & Synapse & BeyondTheCranialVault & 0026\\ 60 & Synapse & BeyondTheCranialVault & 0027\\ 61 & Synapse & BeyondTheCranialVault & 0028\\ 62 & Synapse & BeyondTheCranialVault & 0029\\ 63 & Synapse & BeyondTheCranialVault & 0030\\ 64 & Synapse & BeyondTheCranialVault & 0031\\ 65 & Synapse & BeyondTheCranialVault & 0032\\ 66 & Synapse & BeyondTheCranialVault & 0033\\ 67 & Synapse & BeyondTheCranialVault & 0034\\ 68 & Synapse & BeyondTheCranialVault & 0035\\ 69 & Synapse & BeyondTheCranialVault & 0036\\ 70 & Synapse & BeyondTheCranialVault & 0037\\ 71 & Synapse & BeyondTheCranialVault & 0038\\ 72 & Synapse & BeyondTheCranialVault & 0039\\ 73 & Synapse & BeyondTheCranialVault & 0040\\ 74 & Synapse & BeyondTheCranialVault & 0061\\ 75 & Synapse & BeyondTheCranialVault & 0062\\ 76 & Synapse & BeyondTheCranialVault & 0063\\ 77 & Synapse & BeyondTheCranialVault & 0064\\ 78 & Synapse & BeyondTheCranialVault & 0065\\ 79 & Synapse & BeyondTheCranialVault & 0066\\ 80 & Synapse & BeyondTheCranialVault & 0067\\ 81 & Synapse & BeyondTheCranialVault & 0068\\ 82 & Synapse & BeyondTheCranialVault & 0069\\ 83 & Synapse & BeyondTheCranialVault & 0070\\ 84 & Synapse & BeyondTheCranialVault & 0074\\ 85 & Synapse & BeyondTheCranialVault & 0075\\ 86 & Synapse & BeyondTheCranialVault & 0076\\ 87 & Synapse & BeyondTheCranialVault & 0077\\ 88 & Synapse & BeyondTheCranialVault & 0078\\ 89 & Synapse & BeyondTheCranialVault & 0079\\ 90 & Synapse & BeyondTheCranialVault & 0080\\ \end{bmatrix}\)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Smoking NLP Challenge Data [Dataset]. http://identifiers.org/RRID:SCR_008644

Smoking NLP Challenge Data

RRID:SCR_008644, nif-0000-32739, Smoking NLP Challenge Data (RRID:SCR_008644), NLP Data Set #1C

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_008644

Dataset updated

Mar 7, 2024

Description

The data for the smoking challenge consisted exclusively of discharge summaries from Partners HealthCare which were preprocessed and converted into XML format, and separated into training and test sets. I2B2 is a data warehouse containing clinical data on over 150k patients, including outpatient DX, lab results, medications, and inpatient procedures. ETL processes authored to pull data from EMR and finance systems Institutional review boards of Partners HealthCare approved the challenge and the data preparation process. The data were annotated by pulmonologists and classified patients into Past Smokers, Current Smokers, Smokers, Non-smokers, and unknown. Second-hand smokers were considered non-smokers. Other institutions involved include Massachusetts Institute of Technology, and the State University of New York at Albany. i2b2 is a passionate advocate for the potential of existing clinical information to yield insights that can directly impact healthcare improvement. In our many use cases (Driving Biology Projects) it has become increasingly obvious that the value locked in unstructured text is essential to the success of our mission. In order to enhance the ability of natural language processing (NLP) tools to prise increasingly fine grained information from clinical records, i2b2 has previously provided sets of fully deidentified notes from the Research Patient Data Repository at Partners HealthCare for a series of NLP Challenges organized by Dr. Ozlem Uzuner. We are pleased to now make those notes available to the community for general research purposes. At this time we are releasing the notes (~1,000) from the first i2b2 Challenge as i2b2 NLP Research Data Set #1. A similar set of notes from the Second i2b2 Challenge will be released on the one year anniversary of that Challenge (November, 2010).

Clear search

Close search

Google apps

Main menu

Smoking NLP Challenge Data

Hand Washing Video Dataset Annotated According to the World Health...

Multi-organ Abdominal CT Reference Standard Segmentations

Smoking NLP Challenge Data

RRID:SCR_008644, nif-0000-32739, Smoking NLP Challenge Data (RRID:SCR_008644), NLP Data Set #1C