MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for [Dataset Name]
Dataset Summary
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts.
Supported Tasks and Leaderboards
The official leaderboard is available at: https://pubmedqa.github.io/. 500 questions in the pqa_labeled are used as the test set. They can be found at… See the full description on the dataset page: https://huggingface.co/datasets/qiaojin/PubMedQA.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
PubMedQA - A Dataset for Biomedical Research Question Answering
Dataset Description
Links
Homepage: Github.io
Repository: Github
Paper: arXiv
Leaderboard: PapersWithCode
Contact (Original Authors): Qiao Jin (qiaojin.andy@gmail.com)
Contact (Curator): Artur Guimarães (artur.guimas@gmail.com)
Dataset Summary
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation… See the full description on the dataset page: https://huggingface.co/datasets/araag2/PubMedQA.
reference: https://github.com/FreedomIntelligence/HuatuoGPT-o1/blob/main/evaluation/data/eval_data.json
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
元データ: https://huggingface.co/datasets/HPAI-BSC/PubmedQA-Mixtral-CoT 使用したコード: https://github.com/LLMTeamAkiyama/0-data_prepare/tree/master/src/PubmedQA-Mixtral-CoT データ件数: 206,962 平均トークン数: 586 最大トークン数: 1,922 合計トークン数: 121,366,170 ファイル形式: JSONL ファイル分割数: 3 合計ファイルサイズ: 532.2 MB 加工内容:
文字数によるフィルタリング:
question (質問) 列の文字数が 6,000文字を超える データを削除します。 response (応答) 列の文字数が 80,000文字を超える データを削除します。
応答 (response) の分割:
response 列を、思考プロセスを記述した「thought」部分と、最終的な結論である「answer」部分に分割します。 分割には Answer: や The answer… See the full description on the dataset page: https://huggingface.co/datasets/LLMTeamAkiyama/clean_pubmedqa_mixtral_cot.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for [Dataset Name]
Dataset Summary
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts.
Supported Tasks and Leaderboards
The official leaderboard is available at: https://pubmedqa.github.io/. 500 questions in the pqa_labeled are used as the test set. They can be found at… See the full description on the dataset page: https://huggingface.co/datasets/qiaojin/PubMedQA.