Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Medical Doctors in the United States increased to 2.77 per 1000 people in 2019 from 2.74 per 1000 people in 2018. This dataset includes a chart with historical data for the United States Medical Doctors.
The National Ambulatory Medical Care Survey (NAMCS), conducted by the National Center for Health Statistics (NCHS), collects data on visits to physician offices to describe patterns of ambulatory care delivery in the United States. As part of NAMCS, the Physician Induction Interview collects information about practice characteristics at physician offices. Partway through the 2020 NAMCS, NCHS added questions to the Physician Induction Interview to assess physician experiences related to COVID-19 in office-based settings. The data include nationally representative estimates of experiences related to COVID-19 among office-based physicians in the United States, including: shortages of personal protective equipment (PPE) in the past 3 months; the ability to test for COVID-19 in the past 3 months; providers testing positive for COVID-19 in the past 3 months; turning away COVID-19 patients in the past 3 months; and telemedicine or telehealth technology use before and after March 2020. Estimates were derived from interviews with physicians in periods 3 and 4 of 2020 NAMCS and periods 1 through 4 of 2021 NAMCS, which occurred between December 15, 2020 and May 6, 2022. The data are considered preliminary, and the results may change with the final data release.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Medical Doctors in Slovenia increased to 3.30 per 1000 people in 2020 from 3.26 per 1000 people in 2019. This dataset includes a chart with historical data for Slovenia Medical Doctors.
These are the test and training data used for experiments presented in BioNLP 2017.
Licence The data are only aimed for research, educational and non-commercial purposes.
How to cite If you use these data, please cite our contribution to BioNLP 2017 as follows:
Automatic classification of doctor-patient questions for a virtual patient record query task Leonardo Campillos-Llanos, Sophie Rosset, Pierre Zweigenbaum Proc. of BioNLP 2017, August 4 2017, Vancouver, Canada, pp. 333-341
Note that these data were manually collected from books aimed at medical consultation and clinical examination, as well as resources for medical translation. These sources also need to be referenced as follows:
Barbara Bates and Lynn S Bickley. 2014. Guide de l’examen clinique-Nouvelle édition 2014. Arnette- John Libbey Eurotext.
Claire Coudé, Franois-Xavier Coudé, and Kai Kassmann. 2011. Guide de conversation médicale - français-anglais-allemand. Lavoisier, Médecine Sciences Publications.
Owen Epstein, David Perkin, John Cookson, and David P. de Bono. 2015. Guide pratique de l’examen clinique. Elsevier Masson, Paris.
Félicie Pastore. 2015. How can I help you today? Guide de la consultation médicale et paramédicale en anglais. Ellipses, Paris.
UMVF/Medical English Portal UFR Médecine de Dijon (Last access: May 2017)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We generated this dataset to train a machine learning model for automatically generating psychiatric case notes from doctor-patient conversations. Since, we didn't have access to real doctor-patient conversations, we used transcripts from two different sources to generate audio recordings of enacted conversations between a doctor and a patient. We employed eight students who worked in pairs to generate these recordings. Six of the transcripts that we used to produce this recordings were hand-written by Cheryl Bristow and rest of the transcripts were adapted from Alexander Street which were generated from real doctor-patient conversations. Our study requires recording the doctor and the patient(s) in seperate channels which is the primary reason behind generating our own audio recordings of the conversations.
We used Google Cloud Speech-To-Text API to transcribe the enacted recordings. These newly generated transcripts are auto-generated entirely using AI powered automatic speech recognition whereas the source transcripts are either hand-written or fine-tuned by human transcribers (transcripts from Alexander Street).
We provided the generated transcripts back to the students and asked them to write case notes. The students worked independently using a software that we developed earlier for this purpose. The students had past experience of writing case notes and we let the students write case notes as they practiced without any training or instructions from us.
NOTE: Audio recordings are not included in Zenodo due to large file size but they are available in the GitHub repository.
LukeGPT88/patient-doctor-text-classifier-eng-dataset-0523 dataset hosted on Hugging Face and contributed by the HF Datasets community
The Medicare Physician & Other Practitioners by Provider dataset provides information on use, payments, submitted charges and beneficiary demographic and health characteristics organized by National Provider Identifier (NPI). Note: This full dataset contains more records than most spreadsheet programs can handle, which will result in an incomplete load of data. Use of a database or statistical software is required.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se), where additional batches of quality-controlled EHRs will be released periodically.
Dataset content
OpenChart-SE, version 1 corpus (txt files and and dataset.csv)
The OpenChart-SE corpus, version 1, contains 50 artificial EHRs (note that the numbering starts with 5 as 1-4 were test cases that were not suitable for publication). The EHRs are available in two formats, structured as a .csv file and as separate textfiles for annotation. Note that flaws in the data were not cleaned up so that it simulates what could be encountered when working with data from different EHR systems. All charts have been checked for medical validity by a resident in Emergency Medicine at a Swedish hospital before publication.
Codebook.xlsx
The codebook contain information about each variable used. It is in XLSForm-format, which can be re-used in several different applications for data collection.
suppl_data_1_openchart-se_form.pdf
OpenChart-SE mock emergency care EHR form.
suppl_data_3_openchart-se_dataexploration.ipynb
This jupyter notebook contains the code and results from the analysis of the OpenChart-SE corpus.
More details about the project and information on the upcoming preprint accompanying the dataset can be found on the project website (https://github.com/Aitslab/openchart-se).
Medical Question Pairs (MQP) Dataset This repository contains a dataset of 3048 similar and dissimilar medical question pairs hand-generated and labeled by Curai's doctors. The dataset is described in detail in our paper.
Methodology We present our doctors with a list of 1524 patient-asked questions randomly sampled from the publicly available crawl of HealthTap. Each question results in one similar and one different pair through the following instructions provided to the labelers:
Rewrite the original question in a different way while maintaining the same intent. Restructure the syntax as much as possible and change medical details that would not impact your response. e.g. "I'm a 22-y-o female" could become "My 26 year old daughter" Come up with a related but dissimilar question for which the answer to the original question would be WRONG OR IRRELEVANT. Use similar key words.
The first instruction generates a positive question pair (similar) and the second generates a negative question pair (different). With the above instructions, we intentionally frame the task such that positive question pairs can look very different by superficial metrics, and negative question pairs can conversely look very similar. This ensures that the task is not trivial.
Dataset format The dataset is formatted as dr_id, question_1, question_2, label. We used 11 different doctors for this task so dr_id ranges from 1 to 11. The label is 1 if the question pair is similar and 0 otherwise.
Dataset statistics The final dataset contains 4567 unique questions. The minimum, maximum, median and average number of tokens in these questions are 4, 81, 20 and 22.675 respectively showing there is reasonable variance in the length of the questions. The shortest question is Are fibroadenomas malignant?
An off-the-shelf medical entity recognizer finds around 1000 unique medical entities in the questions. Some of the top entity mentions were: physician, pregnancy, pain, lasting weeks, menstruation, emotional state, cancer, visual function, headache, bleeding, fever, sexual intercourse
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Medical Doctors in Turkey increased to 2.18 per 1000 people in 2021 from 2.05 per 1000 people in 2020. This dataset includes a chart with historical data for Turkey Medical Doctors.
"Facilitate marketing campaigns with the healthcare email list from Infotanks Media that includes doctors, healthcare professionals, NPI numbers, physician specialties, and more. Buy targeted email lists of healthcare professionals and connect with doctors, specialists, and other healthcare professionals to promote your products and services. Hyper personalize campaigns to increase engagement for better chances of conversion. Reach out to our data experts today! Access 1.2 million physician contact database with 150+ specialities including chiropractors, cardiologists, psychiatrists, and radiologists among others. Get ready to integrate healthcare email lists from Infotanks Media to start email marketing campaigns through any CRM and ESP. Contact us right now! Ensure guaranteed lead generation with segmented email marketing strategies for specialists, departments, and more. Make the best use of target marketing to progress and move closer to your business goals with email listing services for healthcare professionals. Infotanks Media provides 100% verified healthcare email lists with the highest email deliverability guarantee of 95%. Get a custom quote today as per your requirements. Enhance your marketing campaigns with healthcare email lists from 170+ countries to build your global outreach. Request your free sample today! Personalize your business communication and interactions to maximize conversion rates with high quality contact data. Grow your business network in your target markets from anywhere in the world with a guaranteed 95% contact accuracy of the healthcare email lists from Infotanks Media. Contact data experts at Infotanks Media from the healthcare industry to get a quick sample for free. Write to us or call today!
Hyper target within and outside your desired markets with GDPR and CAN-SPAM compliant healthcare email lists that get integrated into your CRM and ESPs. Balance out the sales and marketing efforts by aligning goals using email lists from the healthcare industry. Build strong business relationships with potential clients through personalized campaigns. Call Infotanks Media for a free consultation. Explore new geographies and target markets with a focused approach using healthcare email lists. Align your sales teams and marketing teams through personalized email marketing campaigns to ensure they accomplish business goals together. Add value and grow revenue to take your business to the next level of success. Double up your business and revenue growth with email lists of healthcare professionals. Send segmented campaigns to monitor behaviors and understand the purchasing habits of your potential clients. Send follow up nurturing email marketing campaigns to attract your potential clients to become converted customers. Close deals sooner with detailed information of your prospects using the healthcare email list from Infotanks Media. Reach healthcare professionals on their preferred platform of communication with the email list of healthcare professionals. Identify, capture, explore, and grow in your target markets anywhere in the world with a fully verified, validated, and compliant email database of healthcare professionals. Move beyond the traditional approach and automate sales cycles with buying triggers sent through email marketing campaigns. Use the healthcare email list from Infotanks Media to engage with your targeted potential clients and get them to respond. Increase email marketing campaign response rate to convert better! Reach out to Infotanks Media to customize your healthcare email lists. Call today!"
Data on visits to physician offices, hospital outpatient departments and hospital emergency departments by selected population characteristics. Please refer to the PDF or Excel version of this table in the HUS 2019 Data Finder (https://www.cdc.gov/nchs/hus/contents2019.htm) for critical information about measures, definitions, and changes over time. Note that the data file available here has more recent years of data than what is shown in the PDF or Excel version. Data for 2017 physician office visits are not available. SOURCE: NCHS, National Ambulatory Medical Care Survey and National Hospital Ambulatory Medical Care Survey. For more information on the National Ambulatory Medical Care Survey and the National Hospital Ambulatory Medical Care Survey, see the corresponding Appendix entries at https://www.cdc.gov/nchs/data/hus/hus17_appendix.pdf.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Medical Doctors in Germany increased to 4.98 per 1000 people in 2021 from 4.90 per 1000 people in 2020. This dataset includes a chart with historical data for Germany Medical Doctors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Update — December 7, 2014. – Evidence-based medicine (EBM) is not working for many reasons, for example: 1. Incorrect in their foundations (paradox): hierarchical levels of evidence are supported by opinions (i.e., lowest strength of evidence according to EBM) instead of real data collected from different types of study designs (i.e., evidence). http://dx.doi.org/10.6084/m9.figshare.1122534 2. The effect of criminal practices by pharmaceutical companies is only possible because of the complicity of others: healthcare systems, professional associations, governmental and academic institutions. Pharmaceutical companies also corrupt at the personal level, politicians and political parties are on their payroll, medical professionals seduced by different types of gifts in exchange of prescriptions (i.e., bribery) which very likely results in patients not receiving the proper treatment for their disease, many times there is no such thing: healthy persons not needing pharmacological treatments of any kind are constantly misdiagnosed and treated with unnecessary drugs. Some medical professionals are converted in K.O.L. which is only a puppet appearing on stage to spread lies to their peers, a person supposedly trained to improve the well-being of others, now deceits on behalf of pharmaceutical companies. Probably the saddest thing is that many honest doctors are being misled by these lies created by the rules of pharmaceutical marketing instead of scientific, medical, and ethical principles. Interpretation of EBM in this context was not anticipated by their creators. “The main reason we take so many drugs is that drug companies don’t sell drugs, they sell lies about drugs.” ―Peter C. Gøtzsche “doctors and their organisations should recognise that it is unethical to receive money that has been earned in part through crimes that have harmed those people whose interests doctors are expected to take care of. Many crimes would be impossible to carry out if doctors weren’t willing to participate in them.” —Peter C Gøtzsche, The BMJ, 2012, Big pharma often commits corporate crime, and this must be stopped. Pending (Colombia): Health Promoter Entities (In Spanish: EPS ―Empresas Promotoras de Salud).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This synthetic healthcare dataset serves as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practise, develop, and showcase their data manipulation and analysis skills within the healthcare industry. The inspiration behind this dataset stems from the need for practical and diverse healthcare data for educational and research purposes, addressing the challenge of accessing sensitive real-world healthcare information. Generated using Python's Faker library, it mirrors the structure and attributes commonly found in healthcare records, aiming to foster innovation, learning, and knowledge sharing in healthcare analytics.
This dataset is typically provided as a data file in CSV format. It is structured with columns providing specific information about the patient, their admission, and the healthcare services received. While the exact number of rows or records is not specified, it is designed to be a synthetic dataset suitable for various data analysis and modelling tasks in the healthcare domain.
This dataset is ideal for a wide range of applications, including: * Developing and testing healthcare predictive models. * Practising data cleaning, transformation, and analysis techniques. * Creating data visualisations to gain insights into healthcare trends. * Learning and teaching data science and machine learning concepts in a healthcare context. It can specifically be treated as a Multi-Class Classification Problem for predicting 'Test Results', which contains three categories: Normal, Abnormal, and Inconclusive.
The dataset has a global geographic region. The time range for admissions and discharges, as indicated by the 'Date of Admission' and 'Discharge Date' columns, spans across several years, with examples observed from 2019 to 2024. Demographic scope is covered by patient 'Name', 'Age', 'Gender', and 'Blood Type' information. As this is a synthetic dataset, it does not contain real patient information and is created to mirror common healthcare record structures.
CCO
This dataset is intended for data science, machine learning, and data analysis enthusiasts. It is particularly useful for those looking to engage in learning and experimentation within the healthcare analytics domain. The dataset encourages exploration, analysis, and sharing of findings within communities like Kaggle.
Original Data Source: Healthcare Dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of Doctors: Registered: Medical Council of India data was reported at 1,169.000 Person in 2014. This records a decrease from the previous number of 5,603.000 Person for 2013. Number of Doctors: Registered: Medical Council of India data is updated yearly, averaging 1,989.000 Person from Dec 2002 (Median) to 2014, with 13 observations. The data reached an all-time high of 5,603.000 Person in 2013 and a record low of 921.000 Person in 2004. Number of Doctors: Registered: Medical Council of India data remains active status in CEIC and is reported by Central Bureau of Health Intelligence. The data is categorized under India Premium Database’s Health Sector – Table IN.HLB001: Health Human Resources: Number of Doctors: Registered.
This dataset consists of 405 transcriptions of audio recorded physician-patient interactions conducted at Veterans Health Administration (VHA) medical center primary care clinics. The recordings were collected utilizing concealed (except where indicated) audio recorders by patients. The protocol was approved by VHA Institutional Review Boards, and participating physicians and patients consented to participate in the study. The interactions were analyzed using Content Coding for Contextualization of Care ("4C"). An excel spreadsheet with the coding of the original audio of each transcript is included. All data has been de-identified. "xxx" indicates PHI was removed. "@@@" indicates transcriber did not understand audio. These transcripts are a resource to medical educators and research scientists seeking transcriptions of primary care encounters, as well as those interested in 4C coding in its early stages. Their acquisition was supported with research funding from the Department of Veterans Affairs, Veterans Health Administration, Office of Research and Development, Health Services Research & Development.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MQuAD
The Medical Question and Answering dataset(MQuAD) has been refined, including the following datasets. You can download it through the Hugging Face dataset. Use the DATASETS method as follows.
Quick Guide
from datasets import load_dataset dataset = load_dataset("danielpark/MQuAD-v1")
Medical Q/A datasets gathered from the following websites.
eHealth Forum iCliniq Question Doctors WebMD Data was gathered at the 5th of May 2017.
The MQuAD provides embedded question… See the full description on the dataset page: https://huggingface.co/datasets/danielpark/mquad-v1.
Received 17 February 2025: ‘may I have details of your independent doctor so I can check them out.’ Received 25 February 2025: ‘Please could you arrange for me to receive the Freedom of Information Act so that I can check the qualifications of your independent doctors.’ Our response I can confirm that the NHS Business Services Authority (NHSBSA) holds some of the information you have requested. Question 1 I can confirm that we do hold information on the names and General Medical Council numbers for independent medical assessors. Please note that this response does not relate to a specific claim or claimant. The request is being answered more generally given requests under FOIA are requester-blind, that is to say the identity of the requester is not taken into account when considering a request for information under FOIA. We consider the name and GMC number to be personal data under the Data Protection Act 2018. Disclosure of medical assessors’ names or GMC numbers would result in the identification of the medical assessors when entered into the GMC public register. Please be aware that I have decided not to release the names and GMC numbers of the medical assessors as this information falls under the exemption in section 40 subsections 2 and 3(A)(a) of the FOIA. As the requested information would allow a medical assessor to be identified, I consider this information is exempt. This is because it would breach the first data protection principle as: A. it is not fair to disclose medical assessors’ personal details to the world and is likely to cause damage or distress. B. these details are not of sufficient interest to the public to warrant an intrusion into the privacy of the medical assessor. The requested information is exempt if disclosure would contravene any of the data protection principles. For disclosure to comply with the lawfulness, fairness, and transparency principle, we either need the consent of the data subject(s) or there must be a legitimate interest in disclosure. In addition, the disclosure must be necessary to meet the legitimate interest and finally, the disclosure must not cause unwarranted harm. This means that the NHSBSA is therefore required to conduct a balancing exercise between the legitimate interest of the applicant in disclosure against the rights and freedoms of the medical assessor. While I acknowledge that you have a legitimate interest in disclosure of the information, the disclosure of the requested information would cause unwarranted harm. Disclosure under FOIA is to the world and therefore the NHSBSA has to consider the overall impact of the disclosure and its duty of care. The expectation of the medical assessors is that they will remain anonymous and will therefore not be subject to contact or pressure from claimants or campaigning groups. Given the certainty that the name and/or GMC number will identify the medical assessor there is a reasonable expectation that this information will not be disclosed under the FOIA. Disclosing this information would be unfair and as such this would breach the UK General Data Protection Regulation first data protection principle. Please see the following link to view the section 40 exemption in full: https://www.legislation.gov.uk/ukpga/2000/36/section/40 Question 2 I have established that the NHSBSA does not hold this information. This is because the medical qualifications and experience of the medical assessors are the responsibility of the third-party medical assessment supplier. I hope, however, that the following information provides reassurance on this point. All claims are assessed by the independent medical assessment supplier with a consistent approach. Each case is considered on its own merits, by an experienced independent medical assessor. The contract with our supplier does not require them to tell us details of the qualifications of the medical assessors or their experience. The contract requires that all assessments carried out are undertaken by suitably qualified and experienced registered medical practitioners. This includes being registered on the UK General Medical Council register, with a licence to practise and meet or exceed the following requirements: • they are a registered medical practitioner with at least five years’ post graduate experience; and • they have experience of the performance of medical and/ or disability assessment, addressing questions of causation and impact in the context of legislative or policy requirements to assist the decision maker
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Medical Doctors in the United States increased to 2.77 per 1000 people in 2019 from 2.74 per 1000 people in 2018. This dataset includes a chart with historical data for the United States Medical Doctors.