6 datasets found

MedQA-USMLE
kaggle.com
Updated Jul 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moaaz Tameer (2023). MedQA-USMLE [Dataset]. https://www.kaggle.com/datasets/moaaztameer/medqa-usmle/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Moaaz Tameer
Description
(This is taken directly from the github) This is the data for the paper: Jin, Di, et al. "What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams." arXiv preprint arXiv:2009.13081 (2020). If you would like to use the data, please cite the paper.

Data The data that contains both the QAs and textbooks can be downloaded from this google drive folder. A bit of details of data are explained as below:

For QAs, we have three sources: US, Mainland of China, and Taiwan District, which are put in folders, respectively. All files for QAs are in jsonl file format, where each line is a data sample as a dict. The "XX_qbank.jsonl" files contain all data samples while we also provide an official random split into train, dev, and test sets. Those files in the "metamap" folders are extracted medical related phrases using the Metamap tool.

For QAs, we also include the "4_options" version in for US and Mainland of China since we reported results for 4 options in the paper.

For textbooks, we have two languages: English and simplified Chinese. For simplified Chinese, we provide two kinds of sentence splitting: one is split by sentences, and the other is split by paragraphs.

MIT License

Copyright (c) 2022 Di Jin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Patient Insights: 2.8Lakh Drug & Condition Reviews

kaggle.com

Updated Aug 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mukesh Kumar (2024). Patient Insights: 2.8Lakh Drug & Condition Reviews [Dataset]. http://doi.org/10.34740/kaggle/dsv/9196455

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/9196455

Dataset updated

Aug 18, 2024

Dataset provided by

Kaggle

Authors

Mukesh Kumar

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Drug Information: The dataset likely includes the generic and brand names of medications, allowing researchers to analyze trends across different formulations.

Condition Specificity: While "Bladder Infection" might be a category, the data could contain more specific diagnoses like "Urinary Tract Infection (UTI)" or "Cystitis." This granularity allows for targeted analysis within conditions.
Sentiment Analysis: The text content of reviews can be analyzed to understand patient sentiment towards the medication. This goes beyond the rating by capturing positive experiences, concerns about side effects, and overall satisfaction.
Side Effect Reporting: Reviews often mention side effects experienced by patients. Analyzing this data can help identify common side effects and potential drug interactions.

Use Cases:

Comparative Effectiveness Research: By comparing patient experiences with different medications for the same condition, researchers can gain insights into their relative effectiveness and tolerability.
Patient-Centered Drug Development: Understanding patient perspectives on existing medications can inform the development of new drugs with improved side effect profiles and better patient experiences.
Pharmacovigilance: The dataset can be a valuable source of real-world data on medication safety, helping identify potential adverse effects that may not be captured in clinical trials.
Personalized Medicine: Analyzing patient reviews alongside their medical history could lead to the development of tools for personalized medicine, tailoring treatment plans based on individual responses to medications.
Natural Language Processing (NLP): Techniques like NLP can be used to extract insights from the text content. This could involve identifying patterns in patient experiences, summarizing common themes, or even building chatbots that answer patient questions about medications.

Limitations:

Data Accuracy: Patient reviews might not always be accurate or complete. Users might misreport side effects or have pre-existing biases.
Selection Bias: People with strong positive or negative experiences might be more likely to leave reviews, skewing the data towards extremes.
Anonymity: While anonymized, the data may not capture the full picture of a patient's medical history, which could influence their experience with a medication.

Overall, this patient review dataset offers a unique window into the real-world experiences of patients with various medications. By analyzing this data responsibly and considering its limitations, researchers and healthcare professionals can gain valuable insights to improve patient care and drug development.

Skin diseases image dataset
kaggle.com
zip
Updated Aug 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ismail Hossain (2021). Skin diseases image dataset [Dataset]. https://www.kaggle.com/datasets/ismailpromus/skin-diseases-image-dataset
Explore at:
zip(5568507391 bytes)Available download formats
Dataset updated
Aug 16, 2021
Authors
Ismail Hossain
Description
Dataset

This dataset was created by Ismail Hossain

Released under Data files © Original Authors

Contents
Data from: Cuff-Less Blood Pressure Estimation
kaggle.com
paperswithcode.com
zip
Updated Jun 3, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Kachuee (2017). Cuff-Less Blood Pressure Estimation [Dataset]. https://www.kaggle.com/mkachuee/BloodPressureDataset
Explore at:
zip(4938221622 bytes)Available download formats
Dataset updated
Jun 3, 2017
Authors
Mohammad Kachuee
Description
Data Set Information:

The main goal of this data set is providing clean and valid signals for designing cuff-less blood pressure estimation algorithms. The raw electrocardiogram (ECG), photoplethysmograph (PPG), and arterial blood pressure (ABP) signals are originally collected from the physionet.org and then some preprocessing and validation performed on them. (For more information about the process please refer to our paper)

Attribute Information:

This database consists of a cell array of matrices, each cell is one record part. In each matrix each row corresponds to one signal channel:

1: PPG signal, FS=125Hz; photoplethysmograph from fingertip

2: ABP signal, FS=125Hz; invasive arterial blood pressure (mmHg)

3: ECG signal, FS=125Hz; electrocardiogram from channel II

Note: dataset is splitted to multiple parts to make it easier to load on machines with low memory. Each cell is a record. There might be more than one record per patient (which is not possible to distinguish). However, records of the same patient appear next to each other. N-fold cross test and train is suggested to reduce the chance of trainset being contaminated by test patients.

Relevant Papers:

M. Kachuee, M. M. Kiani, H. Mohammadzade, M. Shabany, Cuff-Less High-Accuracy Calibration-Free Blood Pressure Estimation Using Pulse Transit Time, IEEE International Symposium on Circuits and Systems (ISCAS'15), 2015.

A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, J.Mietus, G. Moody, C. Peng and H. Stanley, â€œPhysiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals,â€ Circulation, vol. 101, no. 23, pp. 215â€“220, 2000.

Citation Request:

If you found this data set useful please cite the following:

M. Kachuee, M. M. Kiani, H. Mohammadzade, M. Shabany, Cuff-Less High-Accuracy Calibration-Free Blood Pressure Estimation Using Pulse Transit Time, IEEE International Symposium on Circuits and Systems (ISCAS'15), 2015.

M. Kachuee, M. M. Kiani, H. Mohammadzadeh, M. Shabany, Cuff-Less Blood Pressure Estimation Algorithms for Continuous Health-Care Monitoring, IEEE Transactions on Biomedical Engineering, 2016.
KJM ECoG - faces_basic
kaggle.com
zip
Updated Nov 28, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chadwick Boulay (2019). KJM ECoG - faces_basic [Dataset]. https://www.kaggle.com/datasets/cboulay/kjm-ecog-faces-basic
Explore at:
zip(2832591791 bytes)Available download formats
Dataset updated
Nov 28, 2019
Authors
Chadwick Boulay
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Original location: https://exhibits.stanford.edu/data/catalog/zk881ps0522

Electrophysiological data from implanted electrodes in the human brain are rare, and therefore scientific access to it has remained somewhat exclusive. Here we present a freely-available curated library of implanted electrocorticographic (ECoG) data and analyses for 16 benchmark behavioral experiments, with 204 individual datasets from 34 patients made with the same amplifiers (at the same sampling rate and filter settings). In every case, electrode positions have been carefully registered to brain anatomy. A large set of fully-commented analysis scripts to interpret these data using modern techniques is embedded in the library alongside the data. All data, anatomic correlations, and analysis files (MATLAB code) are in a common, intuitive file structure at https://searchworks.stanford.edu/view/zk881ps0522. The library may be used as course material or serve as a starter package for researchers early in their career or for established groups, to modify the analyses and re-apply them in new settings.

This dataset comprises preprocessed forms of the data from the "faces basic" experiment in that study.

Also see https://www.kaggle.com/cboulay/kjm-ecog-fingerflex
FSboard
kaggle.com
zip
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google Research (2025). FSboard [Dataset]. https://www.kaggle.com/googleai/fsboard
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 25, 2025
Dataset provided by
Googlehttp://google.com/
Authors
Google Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary

FSboard is an American Sign Language fingerspelling dataset situated in a mobile text entry use case, collected from 147 paid and consenting Deaf signers using Pixel 4A selfie cameras in a variety of environments. At >3 million characters in length and >250 hours in duration, FSboard is the largest fingerspelling recognition dataset to date by a factor of >10x.

We previously hosted a Kaggle competition using MediaPipe Holistic landmarks for the FSboard data; this release now includes the underlying RGB videos and val/test sets.

See the our paper for a more complete exposition of the dataset: FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones

The dataset consists of several categories of synthetically generated phrases (examples in the table below, not real PII) recorded as video clips of ASL fingerspelling (example frames in the figure below, faces blurred here but not in the dataset).

Directory Category Example
"dmk" MacKenzie phrases prevailing wind from the east
"daun" URLs /dfinance/list.asp?id=418/
Addresses 9841 gritt hill
Phone Numbers 166-893-6320
Names mohammed kim

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20954272%2F2a7512937441315b8ddf742e9d02195d%2Ffs-blurred.png?generation=1739550608040254&alt=media" alt="">

Responsible Use

While facial expressions are an essential component of sign language and are therefore included in the dataset, we ask that you blur the signers’ faces when publicizing examples. You should not attempt to reidentify the signers or use their likenesses to generate and publish other content (deepfakes). Please be culturally respectful of the Deaf/Hard of Hearing community in your use of the dataset and do not exaggerate the significance of improving ASL fingerspelling performance, which is only one small component of American Sign Language.

Landmarks

Landmarks were extracted using MediaPipe Holistic . They are provided as tf.train.SequenceExample entries in TFRecordio files. There is also a script which converts these TFRecordio files to Parquet files in a similar format to the one used in the previous Kaggle Competition. Since each entry in the Parquet file represents a single landmark frame, the script also produces a supplemental csv file with video level information.

Sensitive Content Filtering

The synthetic URLs generated in the dataset were created by recombining parts from real URLs. As such, the full breadth of content available on the internet is represented. It is important not to infantilize the Deaf community, and therefore important to ensure that any applications in this space is able to produce arbitrary output. Imagine the frustration when your keyboard r*****s to produce certain ducking words. However, it's also important to ensure that an application doesn't easily produce offensive unintended content. In an effort to facilitate people making sane decisions with this data, we've run a sensitive content filter and keyword searches on the phrases used and manually reviewed the result to produce a boolean tag "sensitiveContent" which is available in the json files. Please ensure that the Deaf community is involved in the creation of any applications targeted to them.

Attribution

If you use FSboard in your work, please cite: @misc{georg2024fsboard3millioncharacters, title={FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones}, author={Manfred Georg and Garrett Tanzer and Saad Hassan and Maximus Shengelia and Esha Uboweja and Sam Sepah and Sean Forbes and Thad Starner}, year={2024}, eprint={2407.15806}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2407.15806}, }
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Directory	Category	Example
"`dmk`"	MacKenzie phrases	prevailing wind from the east
"`daun`"	URLs	/dfinance/list.asp?id=418/
	Addresses	9841 gritt hill
	Phone Numbers	166-893-6320
	Names	mohammed kim

Facebook

Twitter

Click to copy link

Link copied

Cite

Moaaz Tameer (2023). MedQA-USMLE [Dataset]. https://www.kaggle.com/datasets/moaaztameer/medqa-usmle/data

MedQA-USMLE

A Large-scale Open Domain Question Answering Dataset from Medical Exams

Explore at:

180 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 22, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Moaaz Tameer

Description

(This is taken directly from the github) This is the data for the paper: Jin, Di, et al. "What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams." arXiv preprint arXiv:2009.13081 (2020). If you would like to use the data, please cite the paper.

Data The data that contains both the QAs and textbooks can be downloaded from this google drive folder. A bit of details of data are explained as below:

For QAs, we have three sources: US, Mainland of China, and Taiwan District, which are put in folders, respectively. All files for QAs are in jsonl file format, where each line is a data sample as a dict. The "XX_qbank.jsonl" files contain all data samples while we also provide an official random split into train, dev, and test sets. Those files in the "metamap" folders are extracted medical related phrases using the Metamap tool.

For QAs, we also include the "4_options" version in for US and Mainland of China since we reported results for 4 options in the paper.

For textbooks, we have two languages: English and simplified Chinese. For simplified Chinese, we provide two kinds of sentence splitting: one is split by sentences, and the other is split by paragraphs.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Clear search

Close search

Google apps

Main menu

MedQA-USMLE

Patient Insights: 2.8Lakh Drug & Condition Reviews

Skin diseases image dataset

Dataset

Contents

Data from: Cuff-Less Blood Pressure Estimation

Data Set Information:

Attribute Information:

Relevant Papers:

Citation Request:

KJM ECoG - faces_basic

FSboard

Summary

Responsible Use

Landmarks

Sensitive Content Filtering

Attribution

MedQA-USMLESee More Versions

A Large-scale Open Domain Question Answering Dataset from Medical Exams

MedQA-USMLE